Overlapping Data

Most of the statistical tests are based on standard t tests that assume that the two samples being compared are independent of each other. When the columns of a table are formed from the categories of a multiple response variable, data from the same case can be present in both of the columns being tested. This is known as overlapping data, and it means that the two samples cannot be considered independent.

For example, the multiple response variable museums is based on the following question, for which respondents can select any number of responses. When this variable is on the top of a table, respondents who selected more than one response appear in more than one column.

When the columns of a table are formed from the categories of a single response variable, the data in the columns are mutually exclusive, although this does not necessarily guarantee that they are independent.

IBM® SPSS® Data Collection Survey Reporter can perform the column proportions and column means tests on overlapping data because it can detect overlapping data in the columns being tested and use a formula to compensate for the fact that some cases appear in more than one column. The chi-square test cannot be performed on overlapping data.

For more on the theory of overlapping samples, see Kish, Survey Sampling. (Kish, L. Survey Sampling. New York: John Wiley and Sons. ISBN 0-471-48900-X.)