How to Use SPSS for Generalized Estimating Equations
Generalized estimating equations (GEE) are a powerful tool for analyzing repeated measurements or other correlated observations, such as clustered data. GEE extend the generalized linear model to account for the within-subject or within-cluster correlation that is often present in longitudinal or clustered data. In this article, we will show you how to use SPSS for GEE with an example of a randomized controlled trial (RCT) that studied the effects of air pollution on children's respiratory health.
What is GEE?
GEE is a method of estimating the parameters of a generalized linear model when the observations are not independent. For example, in a longitudinal study, the same subjects are measured at different time points, and their responses may be correlated over time. Similarly, in a clustered study, the subjects are grouped into clusters, such as schools or hospitals, and their responses may be correlated within each cluster. GEE takes into account this correlation by specifying a covariance matrix that represents the dependency structure of the data. GEE also provides robust standard errors that are valid even if the covariance matrix is misspecified.
To use SPSS for GEE, you need to have the Custom Tables and Advanced Statistics modules installed. You also need to have your data organized in a long format, where each row represents one observation for one subject at one time point or within one cluster. You need to have variables that identify the subjects and the within-subject or within-cluster factors, such as time or cluster ID. You also need to have a dependent variable and any independent variables that you want to include in your model.
Here are the steps to perform GEE in SPSS:
From the menus choose: Analyze > Generalized Linear Models > Generalized Estimating Equations...
Select one or more subject variables that uniquely define the subjects within the dataset. For example, a single Patient ID variable should be sufficient to define subjects in a single hospital, but the combination of Hospital ID and Patient ID may be necessary if patient identification numbers are not unique across hospitals.
On the Type of Model tab, specify a distribution and link function for your dependent variable. For example, if your dependent variable is binary, you can choose Binomial distribution and Logit link function.
On the Response tab, select your dependent variable.
On the Predictors tab, select any factors and covariates that you want to use as predictors in your model.
On the Model tab, specify model effects using the selected factors and covariates. You can use main effects, interactions, nested effects, or custom effects.
Optionally, on the Repeated tab you can specify:
Within-subject variables that define the ordering of measurements within subjects. For example, if you have repeated measurements over time, you can use a Time variable as a within-subject variable.
Covariance Matrix type that represents the correlation structure of your data. For example, if you assume that the correlation between measurements decreases as the time interval increases, you can choose an Exponential type.
Click OK to run the analysis and view the output.
We will use an example from Shintani, who used GEE to analyze repeatedly measured binary outcome data from an RCT that studied the effects of air pollution on children's respiratory health. The data consist of 1,000 children who were randomly assigned to either an intervention group that received air purifiers or a control group that did not. The outcome variable is wheeze status (yes or no) measured at four time points: baseline (T0), 3 months (T1), 6 months (T2), and 9 months (T3). The predictor variables are group (intervention or control), age (in years), sex (male or female), and asthma status (yes or no) at baseline.
We will use SPSS to fit a GEE model with binomial distribution and logit link function for wheeze status as the dependent variable and group, age, sex, asthma status, time, and their interactions as predictors. We will use child ID as the subject variable and time as the within 06063cd7f5