Weights of survey data

Weighting800px

When conducting a survey, having a representative sample of the population is of paramount importance. But in practice, you are prone to over-sample some kinds of people and under-sample others. Weighting is a statistical technique to compensate for this type of 'sampling bias'. A weight is assigned to:

  • Reflect the data item's relative importance based on the objective of the data collection;
  • Take into account the characteristics of sampling design;
  • Reduce bias arising from nonresponse when the characteristics of the respondents differ from those not responding;
  • Correct identifiable deviations from population characteristics.

Each individual case in the file is assigned a certain coefficient – individual weight – which is used to multiply the case in order to attain the desired characteristics of the sample.

Different types of weights and their different purposes

QauntGuide

Several types of weights have different purposes and a different impact on data analysis.

An answer to the question whether or not to use weights is not straightforward. For particular methods of analysis (e.g., estimating associations, regressions, etc.) using weights may be dysfunctional. There are also general theoretical and methodological issues which discourage some researchers from using weights. However, different types of weights are useful for different purposes. In some situations, it is necessary to take an appropriate weight into account in your analysis (see several types of weighting below).

In all cases, if there are any weights in your data file, the rationale and calculation of the weights must be detailed in the data documentation.

Design weights are constructed in order to mutually adjust individual units’ probabilities of being sampled, which are normally not equal when complex sampling procedures combining multiple methods (stratification, group sampling) in several stages are implemented. For example, we want to adjust the probabilities of being sampled for all respondents in households. While individuals are the sampling units, households are sampled in the first stage. Therefore, respondents’ probabilities of being selected depend on the number of household members.

To solve these differences in sampling probabilities we have to compute design weights. The design weights are equal to the inverse of the probability of inclusion in the sample. The sum of all design weights should be equal to the total number of units in our population.

During the implementation of a survey, we are normally not able to get a response from some of the targeted respondents we sampled due to:

  • Their refusal;
  • Our failure to contact them;
  • Other administrative reasons.

Response rates differ between various population groups and those inequalities can be compensated for by weighting.

The way certain characteristics such as sex, age, and education of your sample population are distributed may differ from the way it is distributed in the actual population. For example, your sample may consist of 66 percent men when they make up only 48 percent of the population. Post-stratification weighting is done in order to achieve a distribution equal with that of such known characteristics of the population. It is called a post-stratification weight because it can only be computed after you have collected all of your data. Stratification comes from the various known strata (such as age group or sex distribution) of the population.

Different groups may be represented in the database in different proportions than they are in reality. Such discrepancies are normally compensated through weighting. For example, international data files combine data from various countries. However, similarly, large surveys are usually implemented in each of these countries, although their total populations are radically different in size. If we want to analyse data about large populations, such as in Europe, then we have to adjust the proportions in the representation of individual European countries.

The data file may include several different types of weights for different purposes. Subsequently, they are combined into a final, combined weight.

Source: Data files from the ESS, round 8, Czechia (European Social Survey, 2016).

Variable name: netusoft
Question: How often a respondent uses internet

In the first column, no weight was applied.
In the second column, the Design Weights (DWEIGHT) are adjusted for different selection probabilities.

 

No weight

 

Design weight

 

 

Frequency

Valid Percent

Frequency

Valid Percent

1 Never

244

10,8

187

8,2

2 Only occasionally

162

7,1

155

6,8

3 A few times a week

302

13,3

284

12,5

4 Most days

384

16,9

379

16,6

5 Every day

1177

51,9

1271

55,8

Total

2269

100

2277

100

System missing

31

 

23

 

Total

2300

 

2300

 

Consider the following ...

An example: Using weights in European Social Survey data

The following table provides an illustration of using weights in the data from the European Social Survey (n.d.) (ESS). There are three different weights available in the ESS Source Main Questionnaire data file (see European Social Survey, 2014):

  1. The design weight takes into consideration the different probabilities of being sampled given the sampling methods implemented in individual countries;
  2. The post-stratification weight corrects for the differences of the sample from selected population characteristics caused by other sampling and non-sampling errors;
  3. The population size weight corrects the fact that the individual countries’ sample sizes are very similar while there are large variations in the size of their actual populations.

Different types of data analysis then require the use of different weights or their combinations. When analysing data from one country alone or comparing data of two or more countries, only the design weight or the post-stratification weight needs to be applied. When combining different countries, design or post-stratification weights in combination with population size weights should be applied.

ESStransp600px

Example – voter turnout (% of respondents voting in the last election)

Weights to be used

Design weight / Post-stratification weight

Population weight

To examine data from a single country – whether a single variable or a cross-tabulation

Voter turnout in Germany

X

 

Voter turnout in Germany by age and gender

X

 

To compare results for two or more countries separately – without using totals or averages

Compare voter turnout in France, Germany, and the UK

X

 

To combine countries – whether on a single variable or via a cross-tabulation

Voter turnout in Scandinavia

X

X

Voter turnout in the EU

X

X

Voter turnout across all countries participating in the ESS

X

X

Compare voter turnout between EU member states and accession countries

X

X

Voter turnout by age group across all ESS participating countries

X

X

Source: European Social Survey, 2014.