Getting To a Household
Once the desired
construction of the overall sample, e.g. total sample size, plus the number of
interviews to be conducted in each sub-stratum has been decided upon, the next
step is to translate that desired sample into actual interviews.
There are at least two very
different ways of proceeding at this point. The key distinction is to decide
between a random probability sample or a quota sample.
Random Probability
Sample
Here, every person in the
population has an equal and known chance of being selected in the final sample.
This presumes that the size of the overall population is definitely known. If
that size is "n", then the person's probability of selection = 1 / n.
If there is a list of all
the people living in a given population, pure random probability sampling means
randomly drawing out the total number of names from that list until the desired
number is reached. Or, if the sample has been stratified into subsamples (e.g.
urban and rural people), X number of names from the list of urban people, and Y
number of names from the list of rural people would then be drawn. Once the
sample is drawn, these persons would simply be visited, contacted by telephone,
or mailed the questionnaire.
Even where there is a
complete list of every person in the population, personal interviews using pure
random sampling tends to be inordinately expensive. Getting interviewers out to
each spot randomly selected by the random sampling procedure regardless of how
remote it is from the other interview sites is costly. Thus, most personal
interview strategies use clustered random sampling. That is, travel
costs are minimized by sending a group of interviewers to some randomly
selected location and then conducting a series of interviews at that location.
Clustered sampling is
widely used because it reduces costs, but also because, very often, a list of
names is not available. Many countries, or provinces, or municipalities, have
no such list, or if they do, will not share them with a researcher.
Thus, although the size of
overall population and the number of people living in various regions or in
various subgroups might be known, there may not actually be a list of
individual names. Clustered sampling around sampling points helps researchers
get to individual households in a way that maintains randomness and an equal
probability of selection.
This involves the selection
of a series of what are called "primary sampling units" (PSUs). PSUs
are the smallest units from which final sampling points will be randomly drawn.
PSUs consist of the smallest geographical units for which there is reliable
population data (and for most surveys, this means the population 18 years and
older). In some counties with good census data, these may be called
"Enumerator Areas."
Final sampling points cannot
be randomly pulled from these PSUs because the PSUs will almost always have
different population sizes. Even where there are census determined Enumerator
Areas consisting of an set number of households each (for example, in Zimbabwe,
EAs have 100 households each), the number of people in each household will
differ. Thus, each potential PSU must be weighted by the actual number of
people living in it. That is, the chance of selecting a final sampling point
from a PSU must be proportionate to the actual population size of the PSU.
Once each PSU has been
weighted by its population size, final sampling points can then be randomly
selected from the list of PSUs. The actual number of final sampling points is
determined by the number of interviews to be conducted at each point and the
total sample size. Most surveys conduct between five and seven interviews at
each point. Thus, if five interviews will be done at each point, and the
overall sample size is 2500, a list of 500 final sampling points must be randomly
selected.
Now we know where we want
to go. For instance, a generated list might reveal 350 suburbs, some populous
ones might be selected more than once, and 150 rural magisterial districts.
Survey researchers will then find maps for each of those areas, and then
randomly select a point in a suburb. This can get quite elaborate, as some
researchers will lay over a transparency of randomly numbered points, then
select a number at random, and then look for the street on the map it overlays.
That is where they finally will send the interviewers.
In many areas, no good maps
exist. Or, rural maps might be so large that they only show the locations of
towns, but not streets within the towns. In this case, one might resort to a
rule such as starting at some common point, such as a church, school, municipal
building, or water tap.
Once interviewers know what
point they have to go, then they should follow a set of rules that allows them
to start picking houses, again at random. For instance, they might go to the agreed
upon point, face the sun, or face east, and then proceed ten houses, and then
interview at every fifth house. The rule should be random, but all your
interviews should follow the same rule. The whole point is that the interviewer
should play no role in the selection of the household.
The very last step involves
selecting an actual, real live respondent. Again, giving every person an equal
chance of selection demands that interviewers do not only speak to people who
answer the door, or the telephone. If interviewers are working from a sample
chosen from some grand population register, then they need to speak to the
specific person whose name appears on the list.
If there isn't such a list,
once interviewers are inside the door, or have someone on the phone, they will
need to "enumerate" the household, or make a list of people who live
in the household (and, normally, are citizens above the age of eighteen). Then
they need to choose one name at random and interview that and only
that person. A common way to select that person randomly is to ask which
person in the household had the most recent birthday. In rural areas, people
may often be irritated at not getting the chance to express themselves
(especially if the head of household is not selected, particularly if it is a
man), and they may not understand the birthday method. One visible way to
display the logic of random selection is to distribute a series of colour coded
cards to everyone who is eligible, then gather them and ask someone in the
house to randomly pull a card from the stack: the person who had held that card
is the one to interview.
However, not every door we
knock upon nor every telephone call placed will result in a successfully
completed interview. Many people will not be at home, many will be at home but
remain inaccessible for a variety of reasons, and many people will simply
refuse to speak to interviewers. As mentioned above, it is important that
interviewers, to the greatest extent possible, not allow people to self-select
themselves out of a sample. This is because those who are not at home or are
unwilling to participate are likely to be different from the overall sample in
important ways. The people more likely to be at home, especially if interviews
are conducted during weekdays, are disproportionately likely to be young
people, unemployed, housewives and the elderly. Those unwilling to talk to
interviewers tend to be more alienated. Those types of people need to be
represented in any sample. This is especially true if alienation is likely to
be related to the topic of interest, like voting.
"Non-response"
can play havoc with the representativeness of a sample. In the U.S.,
"non-response" rates have doubled since the 1950's, going from
between 12% to 22% to anywhere from 30% to 55% for personal interviews and 25%
to 35% for telephone samples. In South Africa, non-response rose well over 100%
in some conservative white communities for surveys conducted in 1993 and 1994.
One method often used is to
correct for non-response by "weighting" the obtained responses
according to known census statistics. So, if not enough middle-aged men were
actually interviewed, the responses of those middle-aged men may be
"weighted upwardly" by some fraction. So, for example, if there are
only half as many of this group as we need in the realized sample, we simply
multiply each case by 1.5.
This is problematic,
however, because it assumes that those who were not included in the sample or
refused to be contacted are similar to those questioned across the entire range
of attitudes tapped by the survey. However, by the very fact that the person
was out of the home (probably working or shopping), or the very fact that they
refuse to speak to the interviewer, probably makes them different from those
people who are at home, or want to speak with the interviewer.
There are a few things that
can to done to minimise the incidence of those who refuse to speak.
Interviewers need to be extensively trained so that they are courteous as
possible. The questionnaire should also feature an introduction that makes the
survey as interesting as possible to the potential respondent, as well as
convey to them the importance attached to their views. Finally, interviewers
should ask if they have come at a convenient time, and if not, offer to make an
appointment at a better time when the respondent is not busy and can spend some
time concentrating on the questions.
Interviewers can try to
minimize the effect of people not being at home in several ways. First of all,
they should try and do a large share of interviews in the evening and at
weekends. Weekdays are difficult because workers are out of the home, but
housewives may have more time to speak to you. The absolutely worst time seems
to be supper time, when people are busy either making dinner, or eating, and
are most irritated at being disturbed.
Secondly, interviewers can
devote a lot of attention to what are called "call-backs." If the
person on the list or the person randomly selected with, for example, the
birthday method is not available, interviewers should ask when that person is
likely to return and then come back at that point in order to get the
interview. Most survey companies require interviewers to make at least two, if
not three "call backs" to get the originally selected person. Some
large surveys providing marketing information on media and product usage
actually require four call backs.
Only once the interviewer
has made the required number of call-backs and still had no luck, should they
be allowed to "substitute" another person for the original
respondent. Furthermore, they should not substitute with someone else from that
house. Rather, they need to follow some rule, such as going two or three houses
to the right or to the left, or dialling a new number below or above the
original number in the telephone listing, and go through the whole process
again.
The whole point is to make
an extra effort to ensure that those likely to be out of the home are not
easily allowed to slip out of the sample, and that they are not easily
substituted with the types of people that are more likely to be found at home.
The advantage of a random
probability sample is that it allows researchers to take advantage of the
mathematical laws of sampling for the purpose of generalizing sample results to
the larger population. These laws tell us that the average (mean) of any
randomly drawn sample will tend to equal the mean of the overall population
from which it is drawn. More specifically, for any given sample size, these
laws provide the formulae to calculate the exact margin of error around any
sample. That is, for a given sample size, a sample estimate will be within plus
or minus the true mean of the overall population 95 per cent of the time. This
is because, if a large number of samples are drawn, the laws of probability
indicate that about five percent would fall outside the normal margin of error.
However, 95% of the samples would fall within a calculable range, or band,
around the true population mean. The larger the sample, the more narrow that
band.
Quotas
An alternative method is
the quota sample. Here, the overall sample is constructed to represent the
overall population along all the important lines of distinction. For instance,
a decision is made that the sample should have certain percentages from each
province and from each city, certain percentages of men and women, of each
language group, and of each race group. However, the final selection of
respondent is left up the interviewer.
Each interviewer is given a
quota to fill in their area: that is, a list of the number of people they must
find and interview who fit different demographic categories. So an interviewer
may be told that they must find five African men and six African women who live
in urban areas, and seven African men and eight African women who live in rural
areas. However, they are not told which houses to go to, or streets to go to,
or given any random process to follow. They must simply find people that fit
the desired categories.
Because interviewers are relieved
of the duty to go through all the random processes described above, they are
able to obtain the desired number of interviewers much more quickly and with
far less travel costs. This makes a quota sample considerably cheaper than a
random probability sample.
However the major drawback
is that, because the equal and known probability of inclusion that
characterizes a probability sample has been dispensed with, mathematical
theories of probability cannot be applied to make any inferences from a quota sample
to the overall population. The frequency of responses from a quota sample can
be calculated, but strictly speaking, the degree to which those results are
representative of the true values in the overall population cannot be
determined.