Surveys are useful when we want to know about large numbers of people. The goal is to speak to some smaller number of people (sample) and to generalize to some larger group of people (population). Sampling is generally complex and usually requires lots of statistics and computers. But it is important to understand the basic logic so as to intelligently communicate to a fieldwork company what is desired and to adequately check on what they actually do.
For what population is information being sought and generalized? Is it all voters? Only likely voters? Only men or only women? Only young or old? Only black or white voters?
Drawing a sample of some larger population can be likened to making soup. When mixing up a big bowl of soup, any good cook will tell you that you only need two or three spoonfuls to get a reasonably reliable idea of what the entire bowl tastes like. Of course, this assumes that the soup has been mixed well and that, as a result, all the salt is not clumped in one corner, or that all the potatoes are not lying at the bottom of the bowl, or that all the garlic has not moved to the side of the bowl. Any of these possibilities would mean that the spoonfuls were likely to be unrepresentative of the whole bowl.
Again, assuming a well-mixed bowl, about the same number of randomly drawn spoonfuls will give a good idea of the taste regardless of whether the soup comes from an ordinary black pot on a home stove or one of those industrial-sized pots in a restaurant. The same number of spoonfuls should do if the bowl is well mixed. The number of desired spoonfuls may increase slightly, but not nearly as fast as the increase in the size of the bowl.
But few populations are "well mixed": there are often groups (or strata of people) whose attitudes differ significantly from other people (just as there are different vegetables and seasonings that taste differently) who are not randomly scattered throughout the population but tend to be clustered together in certain regions, cities or neighbourhoods.
Any possibility that a sample would miss, or under-represent, any of these groups or strata in a purely randomly drawn sample should be reduced to the greatest extent possible. In effect, while attempting to draw a sample that is representative of the whole population (or bowl of soup), one will probably also want to "stratify" the sample so as to draw mini, subsamples of each desired subgroup (thus ensuring adequate subsamples of potatoes, rice, and tomatoes).
This means paying attention to representing people of all race and language groups, all regions, or rich and poor, or urban and rural. Usually, these strata should be constructed so that their size is proportionate to the size of the stratum in the actual population. Thus, if the rural component of some desired population is 52 percent, the rural component of the sample should be the same.
Once a decision is made to stratify along more than two dimensions, however, deciding upon the actual composition of the sample can get quite complicated. A national sample in South Africa, for instance, might dictate that a given number of mixed-race and rural people living in the Western Cape is needed as well as a given number of African and white rural people from that province. This would also mean getting numbers of metropolitan people from each group from that province as well. Since this can become quite difficult, a trained demographer or mathematician can help work this out.
In some instances, however, it may be desirable to have a disproportionate random stratified sample. This usually occurs when some desired subgroup comprises a very small proportion of the desired population. A proportionate sample in South Africa, for instance, would consist of only 8.9 percent mixed-race respondents and around 2.5 percent Indian-background respondents. But if, for reasons of cost, the national sample is only 2,000 people, this would result in less than two hundred actual mixed-race respondents and around forty Indian-background respondents.
It may not be possible, however, to base any reliable statistical estimates on a subsample of forty people. Even with two hundred people, the statistical margin of error may be so large that projections about mixed-race voters would not be very helpful to guide a voter education programme targeted to these communities. This would become even more important if one wanted to examine the differences between men and women, or urban and rural people, or party supporters, within mixed-race and Indian-background subsamples. The numbers of respondents within these subgroups would start to become unhelpfully small.
Thus, an "over sample" of small groups such as these might be considered. In this case, while some small group might merit only forty interviews on a strictly proportionate basis, a decision could be made to conduct a hundred interviews in order to have a more reliable base of information. Once all the data is selected, this disproportionate sampling is corrected by "weighting" the hundred interviews downward by the appropriate ratio so that they represent the correct proportion of the entire sample.
Weighting is also useful with regard to other important demographic attributes for which information exists, but will not be known until respondents open the door. The number of men and women in the desired population as well as the number of people in various educational strata may be known, for example. But it may not be possible to stratify the sample according to these traits a priori, because the interviewer will not learn until someone answers the door or the telephone, whether the respondent is male or female or how much education they have.
Once the sample is done, it can be compared to the actual population along demographic lines for which information is available. The people in the sample can be weighted upward or downward in the appropriate direction. Imagine, for instance, twice as many women being contacted in the sample than actually exist in the overall population. In this case, each woman in the sample would ultimately be weighted downward by half to bring the proportion of women in the sample to its proper proportion.
Constructing samples is a key determinant of the cost for a given survey. Samples that require a high proportion of rural respondents will tend to be relatively expensive given the costs of getting interviewers deep into rural areas.
To be able to survey representative samples on a national basis usually requires a considerable amount of infrastructure and personnel. Thus, most nongovernment organisations, even if they are able to design the project and are capable of analysing the results, will still contract a professional organisation to conduct the actual interviews.