What are aggregated data?

In a non-aggregated dataset, each row in a grid represents one case. For example, in an individual-level dataset, each row represents one person; in a household-level dataset, each row represents one household. In an aggregated dataset, each row represents one combination of characteristics, and a variable (here named CASCOUNT) is provided which represents a count of the number of cases which share this particular combination.

For example, suppose an aggregated dataset was based on individual persons and included the variables:

  • age group
  • sex
  • marital status

In this dataset, one row would represent all women in the oldest age group, whose marital status was 'widowed'. The variable CASCOUNT would show the number of such cases in the sample (for example, 150).

How are aggregated data analysed?

An aggregated dataset can be analysed in just the same way as a non-aggregated dataset: we simply weight the data by the count variable (here named CASCOUNT). The results will be identical to those obtained from a non-aggregated dataset.

In Stata this is done by adding [fweight=cascount] to every analytical command, e.g.

  bysort sex: tabulate agegroup [fweight=cascount], missing

In SPSS, simply weight the dataset by the count variable before starting analysis; the syntax for this is

  Weight by cascount.
Last modified 17 February 2009