tau00 <- 1.20
sigma <- 4.60Issues with Clustered Data
Spring 2026 | CLAS | PSYC 894
Jeffrey M. Girard | Lecture 03a
Multilevel Data and Questions
Conceptual Issues with Clustering
Statistical Issues with Clustering
Random Clusters…
Fixed Groups…
Scenario: We are studying employees (L1) nested within companies (L2).
For each variable below, is it Global, Structural, or Disaggregated?
Individuals who engage in more rigorous physical activity generally have a lower average resting heart rate. However, in the moment, engaging in more rigorous physical activity temporarily increases heart rate.
Caution
Wealthier countries tend to have higher average happiness levels. However, this does not necessarily imply that, within a given country, wealthier individuals tend to be happier. In fact, this effect tends to be much weaker.
Caution
Wealthier individuals in the USA tend to vote more conservatively. However, this does not necessarily imply that wealthier regions also tend to vote more conservatively. In fact, the opposite is true with US states and counties tending to vote more liberally.
Caution
A new treatment may work (or not work) in a given hospital. However, this does not necessarily imply that it would also work (or not work) in all hospitals. There may be other hospital-level moderators at play (e.g., due to different staff, patients, and environmental factors).
Caution
Simulation: We simulate 500 datasets where no effect exists (\(\beta=0\)). We analyze the exact same data two ways: ignoring clustering (LM) and accounting for it (MLM). Because H0 is true, only ~5% of slope p-values should be \(<.05\). This is true for MLM but not for LM.
\[ SE_{\beta_p}=\frac{SD_Y}{SD_{X_p}} \sqrt{\frac{1-R^2}{(n-k-1)(1-R_p^2)}} \]
\(n\) is the sample size (assuming IID)
\(k\) is the number of predictor (\(X\)) variables
\(R^2\) is the variance in \(Y\) explained by all \(X\)
\(R_p^2\) is the variance in \(X_p\) explained by all other \(X\)
Warning
If \(n\) is too large, then \(SE_{\beta_p}\) will be too small.
\[ ICC = \rho = \frac{\tau_{00}^2}{\tau_{00}^2+\sigma^2} \]
easystats usually reports the SDs (\(\sigma\) and \(\tau_{00}\)), so you must square them to get the variances for this formula.\[ DEFF=1+\rho(\bar{n}_j - 1) \]
where \(\rho\) is the ICC and \(\bar{n}_j\) is the average cluster size
The design effect (DEFF) can also be used to calculate the effective sample size \((n^*)\) accounting for clustering
\[ n^*=\frac{n}{1+\rho(\bar{n}_j - 1)} \]
Implications
I recruited 30 participants to each complete 100 trials of a behavioral experiment; thus, my data has 3000 observations. I ran a linear regression and my hypothesized effect was significant! However, my annoying friend told me that I am ignoring “clustering” or something and that my effect might not be significant after all. My other, nice friend explained how to calculate the necessary information, but the software output only gave me Standard Deviations (SDs).
Calculate the ICC, DEFF, DEFT, and \(n^*\) given the following estimates:
tau00): 1.20sigma): 4.60