Getting To A Household
Once the desired construction of the overall sample, e.g. total sample size, plus the number of interviews to be conducted in each sub-stratum has been decided upon, the next step is to translate that desired sample into actual interviews.
There are at least two very different ways of preceding at this point. The key distinction is to decide between a random probability sample or a quota sample.
Random Probability Sample
Here, every person in the population has an equal and known chance of being selected in the final sample. This presumes that the size of the overall population is definately known. If that size is 'n', then the person's probability of selection = 1 / n.
If there is a list of all the people living in a given population, pure random probability sampling means randomly drawing out the total number of names from that list until the desired number is reached. Or, if the sample has been stratefied into subsamples (e.g. urban and rural people), X number of names from the list of urban people, and Y
number of names from the list of rural people would then be drawn. Once the cample is drawn, these persons would simply be visited, contacted by telephone, or mailed the questionnaire.
Even where there is a complete list of every person in the population, personal interviews using pure random sampling tends to be inordinately expensive. Getting interviewers out to each spot randomly selected by the random sampling procedure regardless of how remote it is from the other interview sites is costly. Thus, most personal interview strategies use clustered random sampling. That is, travel costs are minimised by sending a group of interviewers to some randomly selected location and then conduct a series of interviews at that location.
Clustered sampling is widely used because it reduces costs, but also because, very often, a list of names is not available. Many countries, or provinces, or municipalities have no such list, or if they do, will not share them with a researcher.
Thus, although the size of overall population and the number of people living in various regions or in various subgroups might be none, there may not actually be a list of individual names. Clustered sampling around sampling points helps researchers get to individual households in a way that maintains randomness and an equal probability of selection.
This involves the selection of a series of what are called 'primary sampling units' (PSUs). PSUs are the smallest units from which final sampling points will be randomly drawn. PSUs consist of the smallest geographical units for which there is reliable population data (and for most surveys, this means the population 18 years and older). In some counties with good census data, these may be called 'Enumerator Areas.'
Final sampling points cannot be randomly pulled from these PSUs because the PSUs will almost always have different population sizes. Even where there are census determined Enumerator Areas consisting of an equal set number of households each (for example, in Zimbabwe, EAs have 100 households each), the number of people in each household will differ. Thus, each potential PSU must be weighted by the actual number of people living in it. That is, the chance of selecting a final sampling point from a PSU must be proportionate to the actual population size of the PSU.
Once each PSU has been weighted by its population size, final sampling points can then be randomly selected from the list of PSUs. The actual number of final sampling points is determined by the number of interviews to be conducted at each point and the total sample size. Most surveys conduct between five and seven interviews at each point. Thus, if five interviews will be done at each point, and the overall sample size is 2500, a list of 500 final sampling points must be randomly selected.
Now we know where we want to go. For instance, a generated list might reveal 350 suburbs, some populous ones might be selected more than once, and 150 rural magisterial districts. Survey researchers will then find maps for each of those areas, and then randomly select a point in a suburb. This can get quite elaborate as some researchers will lay over a transparency of randomly numbered points, then select a number at random, and then look for the street on the map it
overlays. That is where they finally will send the interviews.
In many areas, no good maps exist. Or, rural maps might be so large that they only show the locations of towns, but not streets within the towns. In this case, one might resort to a rule such as starting at some common point, such as a church, school, municipal building, or water tap.
Once interviewers know what point they have to go, then they should follow a set of rules that allows them to start picking houses again, at random. For instance, they might go to the agreed upon point, face the sun, or face east, and then proceed ten houses, and then interview at every fifth house. The rule should be random, but all your interviews should follow the same rule. The whole point is that the interviewer should play no role in the selection of the household.
The very last step involves selecting an actual, real live respondent. Again, giving every person an equal chance of selection demands that interviewers don't only speak to people who answer the door, or the telephone. If interviewers are working from a sample chosen from some grand population register, wthen they need to speak to the specific person whose name appears on the list.
If there isn't such a list, once interviewers are inside the door, or have someone on the phone, they will need to 'enumerate' the household, or make a list of people who live in the household (and, normally, are citizens above the age of eighteen). Then they need to choose one name at random and interview that, and only that person. A common way to select that person randomly is to ask which person in the household had the most recent birthday. In rural areas, people may often be irritated at not getting the chance to express themselves (especially if the head of household is not selected, particularly if it is a man), and they may not understand the birthday method. One visible way to display the logic of random selection is to distribute a series of colour coded cards to everyone who is eligible, then gather them and ask someone in the house to randomly pull a card from the stack: the person who had held that card is the one to interview.
However, not every door we knocked upon nor every telephone call placed will result in a successfully completed interview. Many people will not be at home, many will be at home but remain inaccessible for a variety of reasons, and many people will simply refuse to speak to interviewers. As mentioned above, it is important that interviewers, to the greatest extent possible, not allow people to self-select themselves out of a sample. This is because those who are not at home or are unwilling to participate are likely to be different from the overall sample in important ways. The people more likely to be at home, especially if interviews are conducted during weekdays, are disproportionately likely to be young people, unemployed, housewives, and the elderly. Those unwilling to talk to interviewers tend to be more alienated. Those types of people need to be represented in any sample. This is especially true if alienation is likely to be related to the topic of interest, like voting.
'Non-response' can play havoc with the representiveness of a sample. In the US, 'non-response' rates have doubled since the 1950s, going from between 12% to 22% to anywhere from 30% to 55% for personal interviews and 25% to 35% for phone samples. In South Africa, non-response rose well over 100% in some conservative white communities for surveys conducted in 1993 and 1994.
One trick often used is to correct for non-response by 'weighting' the obtained responses according to known census statistics. So, if not enough middle-aged men were actually interviewed, the responses of those middle-aged men may be 'weighted upwardly' by some fraction. So, for example, if there are only half as many of this group as we need in the realised
sample, we simply multiply each case by 1.5.
This is problematic, however, because it assumes that those who were not included in the sample or refused to be contacted are similar to those questioned across the entire range of attitudes tapped by the survey. However, by the very fact that the person was out of the home (probably working or shopping), or the very fact that they refuse to speak to the interviewer, probably makes them different from those people who are at home, or want to speak with the interviewer.
There are a few things that can to done to minimise the incidence of those who refuse to speak. Interviewers need to be extensively trained so that they are courteous as possible. The questionnaire should also feature an introduction that makes the survey as interesting as possible to the potential respondent, as well as convey to them the importance attached to their views. Finally, interviewers should ask if they have come at a convenient time, and if not, offer to make an appointment at a better time when the respondent is not busy and can spend some time concentrating on the questions.
Interviewers can try to minimise the effect of people not being at home in several ways. First of all, they should try and do a large share of interviews in the evening and on weekends. Weekdays are difficult because workers are out of the home, but housewives may have more time to speak to you. The absolute worst time seems to be supper time, when people are busy either making dinner, or eating, and are most irritated at being disturbed.
Second, interviewers can devote a lot of attention to what are called 'call-backs.' If the person on the list or the person randomly selected with the birthday or card method, is not available, interviewers should ask when that person is likely to return and then come back at that point in order to get the interview. Most survey companies require interviewers to make at least two, if not three 'call backs' to get the originally selected person. Some large surveys providing marketing information on media and product usage actually require four call backs.
Only once the interviewer has made the required number of call-backs and still had no luck, should they be allowed to 'substitute' another person for the original respondent. Furthermore, they should not substitute with someone else from that house. Rather, they need to follow some rule, such as going two or three houses to the right or to the left, or dialling a new number below or above the original number in the telephone listing, and go through the whole process there.
Again, the whole point is to make an extra effort to ensure that those likely to be out of the home are not easily allowed to slip out of the sample, and that they are not easily substituted with the types of people that are more likely to be found at home.
The advantage of a random probability sample is that it allows researchers to take advantage of the mathematical laws of sampling for the purpose of generalising sample results to the larger population. These laws tell us that average (mean) of any randomly drawn sample will tend to equal the mean of the overall population from which it is drawn. More specifically, for any given sample size, these laws provide the formulae to calculate the exact margin of error around any sample. That is, for a given sample size, a sample estimate will be within plus or minus the true mean of the overall population 95 per cent of the time. This is because if a large number of samples are drawn, the laws of probability indicate that about five percent would fall outside the normal margin of error. However, 95% of the samples would fall within a calculable range, or
band, around the true population mean. The larger the sample, the more narrow that band.
Quotas
An alternative method is the quota sample. Here, the overall sample is constructed to represent the overall population along all the important lines of distinction. For instance, a decision is made that the sample should have certain percentages from each province and from each city, certain percentages of men and women, of each language group, and of each race group. However, the final selection or respondent is left up the interviewer. Each interviewer is given a quota to fill in their area; that is, a list of the number of people they must find and interview who fit different demographic categories. So an interviewer may be told that they must find five African men and six African women who live in urban areas, and seven African men and eight African women who live in rural areas.
However, they are not told which houses to go to, or streets to go to, or given any random process to follow. They must simply find people that fit the desired categories. Because interviewers are relieved of the duty to go through all the random processes described above, they are able to obtain the desired number of interviewers much more quickly and with far
less travel costs. This makes a quota sample considerably cheaper than a random probability sample.
However, the major drawback is because the equal and known probability of inclusion that characterises a probability sample has been dispensed with, mathematical theories of probability cannot be applied to make any inferences from a quota sample to the overall population. The frequency of responses from a quota sample can be calculated, but strictly speaking, it cannot be determined the degree to which those results are representative of the true values in the overall population.