library(tidyverse)
dat_raw <- read_csv("heck2011.csv") |>
mutate(school = factor(school))
dat_l2 <- dat_raw |>
summarize(
.by = school,
puniv = first(puniv),
ses_b = mean(ses, na.rm = TRUE)
) Centering and Degrouping
Spring 2026 | CLAS | PSYC 894
Jeffrey M. Girard | Lecture 08a
If we ignore this and report the conflated effect anyway, we risk falling into:
“If the model does not properly account for the fact that there are different relationships between the predictor and the outcome at different levels, we end up with a slope that is an uninterpretable blend of the within-person and between-person slopes.” — Hamaker & Muthén (2020)
Why do we leave Y raw?
Here is a summary checklist for continuous variables in multilevel models.
When we work with Level 1 data (one row per student), Level 2 variables (like school funding or proportion going to university) are repeated for every student in that school
If we simply run center(puniv) on this dataset, the software calculates the mean across all rows, not all schools
Why is this a problem?
To safely center and standardize our variables without size bias, we need to explicitly separate our data by level
First, we load our data and create a Level 2 dataset. We grab our L2 predictor (puniv) and calculate the raw cluster mean for our L1 predictor (ses_b)
Because this dataset has exactly one row per school, we can safely center and standardize without weighting biases!
Rows: 419
Columns: 7
$ school <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
$ puniv <dbl> 0.08333333, 1.00000000, 0.33333333, 1.00000000, 1.00000000, 1.…
$ ses_b <dbl> -0.2667500, 0.6799231, -0.5480000, 0.8159412, -0.3892941, 0.25…
$ puniv_c <dw_trnsf> -0.7816475, 0.1350191, -0.5316475, 0.1350191, 0.1350191, …
$ ses_bc <dw_trnsf> -0.2862979, 0.6603752, -0.5675479, 0.7963933, -0.4088420,…
$ puniv_s <dw_trnsf> -2.8454676, 0.4915165, -1.9353811, 0.4915165, 0.4915165, …
$ ses_bs <dw_trnsf> -0.5883235, 1.3570281, -1.1662739, 1.6365364, -0.8401436,…
Next, take the original L1 data and join our new L2 data into it
Finally, we calculate the within-cluster component. Because cluster-mean-centered scores always sum to zero within a cluster, standardize() automatically calculates the correct pooled within-cluster standard deviation.
Rows: 6,871
Columns: 5
$ school <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
$ ses_b <dbl> -0.2667500, -0.2667500, -0.2667500, -0.2667500, -0.2667500, -0.…
$ ses <dbl> 0.586, 0.304, -0.544, -0.848, 0.001, -0.106, -0.330, -0.891, 0.…
$ ses_wc <dbl> 0.852750000, 0.570750000, -0.277250000, -0.581250000, 0.2677500…
$ ses_ws <dw_trnsf> 1.409368976, 0.943297969, -0.458220520, -0.960651677, 0.44…
ses_wc: The student’s relative wealth compared to their peersses_bc: The school’s overall average wealthpuniv_c: The school’s university attendance rate (as an L2 covariate)Level One (Observation)
\[ y_{ij} = \beta_{0j} + \beta_{1j} x_{ij}^{(wc)} + \color{#4daf4a}{e_{ij}} \]
Level Two (Cluster)
\[ \begin{align} \beta_{0j} &= \color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} x_j^{(bc)} + \color{#377eb8}{\gamma_{02}} z_j^{(c)} + \color{#e41a1c}{u_{0j}} \\ \beta_{1j} &= \color{#377eb8}{\gamma_{10}} + \color{#e41a1c}{u_{1j}} \end{align} \]
Substitute the L2 equations directly into the L1 equation:
\[ y_{ij} = \underbrace{\color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} x_j^{(bc)} + \color{#377eb8}{\gamma_{02}} z_j^{(c)} + \color{#377eb8}{\gamma_{10}} x_{ij}^{(wc)}}_{\text{Fixed}} + \underbrace{\color{#e41a1c}{u_{0j}} + \color{#e41a1c}{u_{1j}} x_{ij}^{(wc)}}_{\text{Random}} + \color{#4daf4a}{e_{ij}} \]
Formula Syntax
y ~ 1 + x_between + x_within + z_l2 + (1 + x_within | cluster)
Our Example
math ~ 1 + ses_bc + ses_wc + puniv_c + (1 + ses_wc | school)
# Fixed Effects
Parameter | Coefficient | SE | 95% CI | t(6863) | p
--------------------------------------------------------------------
(Intercept) | 57.66 | 0.12 | [57.42, 57.90] | 470.53 | < .001
ses bc | 5.60 | 0.26 | [ 5.09, 6.11] | 21.54 | < .001
ses wc | 3.19 | 0.17 | [ 2.86, 3.51] | 19.14 | < .001
puniv c | 1.52 | 0.47 | [ 0.60, 2.43] | 3.25 | 0.001
Because our predictors are centered, the coefficients represent changes in raw math points for a 1-unit increase in the raw predictor
Intercept (57.66): The expected math score for an average student (ses_wc = 0) at an average school (ses_bc = 0 and puniv_c = 0)
ses_bc (5.60): Controlling for student wealth and university attendance, a 1-unit increase in a school’s overall SES is associated with a 5.60 point increase in math
ses_wc (3.19): Controlling for school factors, a 1-unit increase in a student’s relative wealth is associated with a 3.19 point increase in math
puniv_c (1.52): Controlling for school and student SES, a 1-unit increase in the university attendance rate (e.g., from 10% to 11%) is associated with a 1.52 point increase in math
The structure remains identical, but we swap in the standardized components
Level One (Observation)
\[ y_{ij} = \beta_{0j} + \beta_{1j} x_{ij}^{(ws)} + \color{#4daf4a}{e_{ij}} \]
Level Two (Cluster)
\[ \begin{align} \beta_{0j} &= \color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} x_j^{(bs)} + \color{#377eb8}{\gamma_{02}} z_j^{(s)} + \color{#e41a1c}{u_{0j}} \\ \beta_{1j} &= \color{#377eb8}{\gamma_{10}} + \color{#e41a1c}{u_{1j}} \end{align} \]
Because our inputs are standardized, the resulting gammas will tell us the expected change in math points for a 1-SD increase in the predictors.
# Fixed Effects
Parameter | Coefficient | SE | 95% CI | t(6863) | p
--------------------------------------------------------------------
(Intercept) | 57.66 | 0.12 | [57.42, 57.90] | 470.52 | < .001
ses bs | 2.73 | 0.13 | [ 2.48, 2.98] | 21.54 | < .001
ses ws | 1.93 | 0.10 | [ 1.73, 2.12] | 19.14 | < .001
puniv s | 0.42 | 0.13 | [ 0.17, 0.67] | 3.25 | 0.001
Because our predictors are standardized and our outcome is raw, the coefficients represent changes in raw math points for a 1-SD increase.
Level One (Observation)
\[ y_{ij} = \beta_{0j} + \beta_{1j} x_{ij}^{(wc)} + \color{#4daf4a}{e_{ij}} \]
Level Two (Cluster)
\[ \begin{align} \beta_{0j} &= \color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} x_j^{(bc)} + \color{#e41a1c}{u_{0j}} \\ \beta_{1j} &= \color{#377eb8}{\gamma_{10}} + \underbrace{\color{#377eb8}{\gamma_{11}} x_j^{(bc)}}_{\text{New}} + \color{#e41a1c}{u_{1j}} \end{align} \]
Substitute the L2 equations directly into the L1 equation:
\[ y_{ij} = \underbrace{\color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} x_j^{(bc)} + \color{#377eb8}{\gamma_{10}} x_{ij}^{(wc)} + \color{#377eb8}{\gamma_{11}} x_j^{(bc)} x_{ij}^{(wc)}}_{\text{Fixed}} + \dots \] \[ \dots + \underbrace{\color{#e41a1c}{u_{0j}} + \color{#e41a1c}{u_{1j}} x_{ij}^{(wc)}}_{\text{Random}} + \color{#4daf4a}{e_{ij}} \]
Formula Syntax
y ~ 1 + x_between * x_within + (1 + x_within | cluster)
Our Example
math ~ 1 + ses_bc * ses_wc + (1 + ses_wc | school)
# Fixed Effects
Parameter | Coefficient | SE | 95% CI | t(6863) | p
------------------------------------------------------------------------
(Intercept) | 57.67 | 0.12 | [57.42, 57.91] | 465.72 | < .001
ses bc | 5.88 | 0.25 | [ 5.39, 6.38] | 23.25 | < .001
ses wc | 3.17 | 0.17 | [ 2.84, 3.50] | 18.84 | < .001
ses bc × ses wc | -0.29 | 0.37 | [-1.02, 0.45] | -0.77 | 0.444
Estimated Marginal Effects
ses_bc | Slope | SE | 95% CI | t(6863) | p
-------------------------------------------------------
-0.32 | 3.26 | 0.19 | [2.88, 3.64] | 16.82 | < .001
-0.03 | 3.18 | 0.17 | [2.85, 3.51] | 19.00 | < .001
0.28 | 3.09 | 0.21 | [2.68, 3.50] | 14.81 | < .001
Marginal effects estimated for ses_wc
Type of slope was dY/dX
The positive effect of relative SES remains stable and significant across schools with low, medium, and high average SES
Finally, we compare with standardized (s) predictors
Level One (Observation)
\[ y_{ij} = \beta_{0j} + \beta_{1j} x_{ij}^{(ws)} + \color{#4daf4a}{e_{ij}} \]
Level Two (Cluster)
\[ \begin{align} \beta_{0j} &= \color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} x_j^{(bs)} + \color{#e41a1c}{u_{0j}} \\ \beta_{1j} &= \color{#377eb8}{\gamma_{10}} + \color{#377eb8}{\gamma_{11}} x_j^{(bs)} + \color{#e41a1c}{u_{1j}} \end{align} \]
# Fixed Effects
Parameter | Coefficient | SE | 95% CI | t(6863) | p
------------------------------------------------------------------------
(Intercept) | 57.67 | 0.12 | [57.42, 57.91] | 465.72 | < .001
ses bs | 2.86 | 0.12 | [ 2.62, 3.10] | 23.25 | < .001
ses ws | 1.92 | 0.10 | [ 1.72, 2.12] | 18.84 | < .001
ses bs × ses ws | -0.08 | 0.11 | [-0.30, 0.13] | -0.77 | 0.444
Because our predictors are standardized and our outcome is raw, the coefficients represent changes in raw math points for a 1-SD increase