Multilevel Modeling

Repeated Measures

Spring 2026 | CLAS | PSYC 894
Jeffrey M. Girard | Lecture 11a

Roadmap

  1. Within-Person Variance
    • Trial-level Analysis
    • Task-by-Person Effects
  2. The Structure of Error
    • Autocorrelation Tests
    • Drift vs. White Noise

Repeated Measures

Overview

  • Repeated measures designs are ubiquitous in social science
    • e.g., completing multiple trials of a task
    • e.g., responding to within-person experimental conditions
    • e.g., answering daily diary surveys
  • These are perfect applications for multilevel modeling
    • Instead of individuals (L1) within larger units (L2)…
    • …we have observations (L1) within individuals (L2)

The Base Example

Main Variables

  • Outcomes = performance (numeric score on a trial)
  • L2 Predictor = icontrol (trait inhibitory control)

Research Question

  • Does a person’s trait inhibitory control predict their average intellectual performance across multiple trials?

Example Data

library(tidyverse)
library(glmmTMB)
library(easystats)

intell <- read_csv("intell_sim.csv")
glimpse(intell)
Rows: 1,800
Columns: 5
$ subject     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2…
$ icontrol    <dbl> -0.0848893, -0.0848893, -0.0848893, -0.0848893, -0.0848893…
$ trial       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7…
$ task        <chr> "Perceptual Reasoning", "Processing Speed", "Working Memor…
$ performance <dbl> 17.997978, 14.412448, 22.439953, 17.173733, 21.686384, 14.…

Each of 150 (fake) subjects completed 12 experimental trials.

The Aggregation Approach

Historically, researchers would simply average the L1 observations to create a single L2 score for each person, then run a standard linear regression.

# Collapse L1 to L2
intell_avg <- 
  intell |> 
  summarize(
    .by = subject,
    icontrol = first(icontrol),
    perf_avg = mean(performance, na.rm = TRUE)
  )
# Fit single-level model
fit_avg <- lm(perf_avg ~ icontrol, data = intell_avg)
model_parameters(fit_avg)
Parameter   | Coefficient |   SE |         95% CI | t(148) |      p
-------------------------------------------------------------------
(Intercept) |       11.05 | 0.48 | [10.10, 12.00] |  22.97 | < .001
icontrol    |        5.47 | 0.49 | [ 4.49,  6.44] |  11.09 | < .001

Why Multilevel is Better

  • Information Loss: Averaging collapses all L1 variance into a single point.
    • We lose the ability to analyze within-person fluctuation or add L1 predictors.
  • Unbalanced Data: If some subjects have missing trials, simple aggregation ignores the sample size differences.
    • A mean based on 2 trials is treated exactly the same as a mean based on 10 trials.
  • Reliability Weighting: MLM uses partial pooling (shrinkage).
    • Estimates for subjects with fewer or more variable observations are mathematically pulled closer to the group average to prevent overfitting.

Multilevel Equation

Level One (Observation)

\[ \begin{align} y_{ti} &= \beta_{0i} + \color{#4daf4a}{e_{ti}} \\ \color{#4daf4a}{e_{ti}} &\sim \text{Normal}(0, \color{#4daf4a}{\sigma}) \end{align} \]

Level Two (Cluster)

\[ \begin{align} \beta_{0i} &= \color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} z_{i} + \color{#e41a1c}{u_{0i}} \\ \color{#e41a1c}{u_{0i}} &\sim \text{Normal}(0, \color{#e41a1c}{\tau_{00}}) \end{align} \]

Note that we have changed from \(i\) within \(j\) to \(t\) within \(i\)

Path Diagram

Mixed (or Reduced) Equation

\[ \begin{alignat}{2} &\text{L1:} \quad & y_{ti} &= \beta_{0i} + \color{#4daf4a}{e_{ti}} \\ &\text{L2:} \quad & \beta_{0i} &= \color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} z_i + \color{#e41a1c}{u_{0i}} \\ \end{alignat} \]

Substitute the L2 equations directly into the L1 equation:

\[ y_{ti} = (\color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} z_i + \color{#e41a1c}{u_{0i}}) + \color{#4daf4a}{e_{ti}} \]

Rearrange to group the fixed and random parts:

\[ y_{ti} = \underbrace{\color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} z_i}_{\text{Fixed}} + \underbrace{\color{#e41a1c}{u_{0i}}}_{\text{Random}} + \color{#4daf4a}{e_{ti}} \]

Formula

\[ y_{ti} = \underbrace{\color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} z_i}_{\text{Fixed}} + \underbrace{\color{#e41a1c}{u_{0i}}}_{\text{(Random)}} + \color{#4daf4a}{e_{ti}} \]

Generic

y ~ 1 + z + (1 | cluster)

where the 1 + z part is fixed
and the (1 | cluster) part is random

Example

performance ~ 1 + icontrol + (1 | subject)

Fitting the Model

fit_mlm <- glmmTMB(
  formula = performance ~ 1 + icontrol + (1 | subject),
  data = intell
)
model_parameters(fit_mlm, effects = "all")
# Fixed Effects

Parameter   | Coefficient |   SE |         95% CI |     z |      p
------------------------------------------------------------------
(Intercept) |       11.05 | 0.48 | [10.11, 11.98] | 23.13 | < .001
icontrol    |        5.47 | 0.49 | [ 4.51,  6.43] | 11.16 | < .001

# Random Effects

Parameter               | Coefficient
-------------------------------------
SD (Intercept: subject) |        5.66
SD (Residual)           |        4.71

Visualizing the Results

estimate_relation(fit_mlm, by = "icontrol") |> plot() + labs(y = "performance")

Interpreting the Model

Fixed Effects (The Average Trend)

  • (Intercept) (\(\gamma_{00}\)): The expected performance on a trial for a subject with an inhibitory control score of 0.
  • icontrol (\(\gamma_{01}\)): The expected difference in average performance between subjects who differ by 1 unit in inhibitory control.

Random Effects (The Variance)

  • SD (Intercept) (\(\tau_{00}\)): The standard deviation of the true person-specific means around the overall regression line (L2 variance).
  • SD (Residual) (\(\sigma\)): The standard deviation of the individual trials around the person-specific means (L1 variance).

Unconditional ICC

fit0 <- glmmTMB(performance ~ 1 + (1 | subject), data = intell)
icc(fit0)
# Intraclass Correlation Coefficient

    Adjusted ICC: 0.731
  Unadjusted ICC: 0.731

The ICC tells us the proportion of total variance in performance that is attributable to stable differences between people. The remaining variance (1 − ICC) is the within-person variability across trials.

Adding L1 Predictors

The “Many Tasks” Example

We can expand our model by adding predictors at Level 1. What if those repeated trials actually came from four different experimental tasks?

Updated Variables

  • Outcomes = performance (numeric)
  • L1 Predictors = task (factor with 4 levels)
  • L2 Predictors = icontrol (numeric)

Research Question

  • Some theories say inhibitory control increases all aspects of intellectual performance, while others say it only improves working memory.

Multilevel Equation Part 1

Level One (Observation)

\[y_{ti} = \beta_{0i} + \beta_{1i} \text{TASK2}_{ti} + \beta_{2i} \text{TASK3}_{ti} + \beta_{3i} \text{TASK4}_{ti} + \color{#4daf4a}{e_{ti}}\]

Level Two (Cluster)

\[\begin{align} \beta_{0i} &= \color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} \text{ICTL}_{i} + \color{#e41a1c}{u_{0i}} \\ \beta_{1i} &= \color{#377eb8}{\gamma_{10}} + \color{#377eb8}{\gamma_{11}} \text{ICTL}_{i} + \color{#e41a1c}{u_{1i}} \\ \beta_{2i} &= \color{#377eb8}{\gamma_{20}} + \color{#377eb8}{\gamma_{21}} \text{ICTL}_{i} + \color{#e41a1c}{u_{2i}} \\ \beta_{3i} &= \color{#377eb8}{\gamma_{30}} + \color{#377eb8}{\gamma_{31}} \text{ICTL}_{i} + \color{#e41a1c}{u_{3i}} \\ \end{align}\]

Multilevel Equation Part 2

Level One (Observation)

\[ \color{#4daf4a}{e_{ti}} \sim \text{Normal}(0, \color{#4daf4a}{\sigma}) \]

Level Two (Cluster)

\[\begin{bmatrix} \color{#e41a1c}{u_{0i}}\\ \color{#e41a1c}{u_{1i}}\\ \color{#e41a1c}{u_{2i}}\\ \color{#e41a1c}{u_{3i}} \end{bmatrix} \sim \text{MVN}\left( \begin{bmatrix} 0\\ 0\\ 0\\ 0 \end{bmatrix}, \begin{bmatrix} \color{#e41a1c}{\tau_{00}} & & & \\ \color{#e41a1c}{\tau_{10}} & \color{#e41a1c}{\tau_{11}} & & \\ \color{#e41a1c}{\tau_{20}} & \color{#e41a1c}{\tau_{21}} & \color{#e41a1c}{\tau_{22}} & \\ \color{#e41a1c}{\tau_{30}} & \color{#e41a1c}{\tau_{31}} & \color{#e41a1c}{\tau_{32}} & \color{#e41a1c}{\tau_{33}} \end{bmatrix} \right)\]

Path Diagram

Mixed (or Reduced) Equation

Substitute the L2 equations directly into the L1 equation:

\[\begin{align} y_{ti} &= (\color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}} \text{ICTL}_{i} + \color{#e41a1c}{u_{0i}}) \\ &\quad + (\color{#377eb8}{\gamma_{10}} + \color{#377eb8}{\gamma_{11}} \text{ICTL}_{i} + \color{#e41a1c}{u_{1i}})\text{TASK2}_{ti} \\ &\quad + (\color{#377eb8}{\gamma_{20}} + \color{#377eb8}{\gamma_{21}} \text{ICTL}_{i} + \color{#e41a1c}{u_{2i}})\text{TASK3}_{ti} \\ &\quad + (\color{#377eb8}{\gamma_{30}} + \color{#377eb8}{\gamma_{31}} \text{ICTL}_{i} + \color{#e41a1c}{u_{3i}})\text{TASK4}_{ti} + \color{#4daf4a}{e_{ti}} \end{align}\]

Rearrange to group the fixed and random parts:

\[\begin{align} y_{ti} &= \underbrace{\color{#377eb8}{\gamma_{00}} + \color{#377eb8}{\gamma_{01}}\text{ICTL}_{i} + \color{#377eb8}{\gamma_{10}}\text{TASK2}_{ti} + \dots + \color{#377eb8}{\gamma_{31}}\text{ICTL}_{i}\text{TASK4}_{ti}}_{\text{Fixed}} \\ &\quad + \underbrace{\color{#e41a1c}{u_{0i}} + \color{#e41a1c}{u_{1i}}\text{TASK2}_{ti} + \color{#e41a1c}{u_{2i}}\text{TASK3}_{ti} + \color{#e41a1c}{u_{3i}}\text{TASK4}_{ti}}_{\text{Random}} + \color{#4daf4a}{e_{ti}} \end{align}\]

Formula

Generic

y ~ 1 + w * z + (1 + w | cluster)

where the 1 + w * z part is fixed
and the (1 + w | cluster) part is random

Example

performance ~ 1 + task * icontrol
+ (1 + task | subject)

Fitting the Model

fit1 <- glmmTMB(
  formula = performance ~ 1 + task * icontrol + (1 + task | subject),
  data = intell
)
model_parameters(fit1, effects = "fixed")
# Fixed Effects

Parameter                          | Coefficient |   SE |        95% CI |     z |      p
----------------------------------------------------------------------------------------
(Intercept)                        |        7.71 | 0.53 | [ 6.68, 8.75] | 14.62 | < .001
task [Processing Speed]            |        2.92 | 0.41 | [ 2.12, 3.72] |  7.17 | < .001
task [Verbal Reasoning]            |        2.61 | 0.31 | [ 2.00, 3.22] |  8.34 | < .001
task [Working Memory]              |        7.81 | 0.38 | [ 7.07, 8.54] | 20.81 | < .001
icontrol                           |        4.89 | 0.54 | [ 3.83, 5.95] |  9.05 | < .001
task [Processing Speed] × icontrol |        0.11 | 0.42 | [-0.70, 0.93] |  0.27 | 0.784 
task [Verbal Reasoning] × icontrol |        0.10 | 0.32 | [-0.53, 0.73] |  0.32 | 0.749 
task [Working Memory] × icontrol   |        2.07 | 0.38 | [ 1.32, 2.83] |  5.39 | < .001

Simple Slopes Analysis

estimate_slopes(fit1, trend = "icontrol", by = "task")
Estimated Marginal Effects

task                 | Slope |   SE |       95% CI |     z |      p
-------------------------------------------------------------------
Perceptual Reasoning |  4.89 | 0.54 | [3.83, 5.95] |  9.05 | < .001
Processing Speed     |  5.01 | 0.60 | [3.83, 6.19] |  8.33 | < .001
Verbal Reasoning     |  5.00 | 0.46 | [4.10, 5.89] | 10.91 | < .001
Working Memory       |  6.97 | 0.54 | [5.91, 8.02] | 12.96 | < .001

Marginal effects estimated for icontrol
Type of slope was dY/dX

Visualize Results

estimate_relation(fit1, by = c("icontrol", "task")) |> plot()

Interpreting Results

  1. Inhibitory control was positively associated with performance in all four tasks (see simple slopes analysis), which supports the first theory.
  1. However, the effect was strongest by far in the working memory task, as evidenced in the marginal effects plot. This may explain where the second theory came from.
  1. These (fake) results suggest a hybrid theory where inhibitory control helps everything somewhat, but helps working memory the most.

Addressing Autocorrelation

The Independence Assumption

  • MLM assumes that the Level 1 residuals (\(e_{ti}\)) are independent after accounting for the cluster variances (\(u_{0i}\)).
  • But what if the observations happen in a sequence over time?
  • Even among observations in the same cluster, trials that are closer together in time might be more similar to one another.
  • This creates autocorrelation, violating our assumption.

Incorporating Time

To test for and model this, we need a variable that represents the sequence of observations. Luckily, our dataset has a trial column that tracks the randomized order in which the tasks were completed by each subject.

intell |> print(n = 6)
# A tibble: 1,800 × 5
  subject icontrol trial task                 performance
    <dbl>    <dbl> <dbl> <chr>                      <dbl>
1       1  -0.0849     1 Perceptual Reasoning        18.0
2       1  -0.0849     2 Processing Speed            14.4
3       1  -0.0849     3 Working Memory              22.4
4       1  -0.0849     4 Working Memory              17.2
5       1  -0.0849     5 Working Memory              21.7
6       1  -0.0849     6 Perceptual Reasoning        14.8
# ℹ 1,794 more rows

Checking Autocorrelation

We can perform a formal test on the residuals of our model to see if this sequence matters.

check_autocorrelation(fit1)
Warning: Autocorrelated residuals detected (p < .001).

Because this flags a violation, we should refit the model and explicitly account for the correlation between adjacent trials.

Partitioning the Error

To fix the autocorrelation, we split our unmeasured error into two piles:

\[y_{ti} = \text{Fixed} + \text{Random} + \color{#4daf4a}{(v_{ti} + e_{ti})}\]

  • White Noise: The independent, unpredictable noise for a specific trial. \[\color{#4daf4a}{e_{ti}} \sim \text{Normal}(0, \color{#4daf4a}{\sigma_e})\]
  • Autocorrelated Error: The portion of the error that correlates with adjacent trials. Adding this requires two new parameters: the amount of this error \((\sigma_v)\) and the strength of the correlation \((\rho)\). \[\color{#4daf4a}{\mathbf{v}_i} \sim \text{AR1}(0, \color{#4daf4a}{\sigma_v}, \color{#4daf4a}{\rho})\]

Prepping the Time Factor

intell <- 
  intell |> 
  # Sort by trial within subject
  arrange(subject, trial) |> 
  # Create a factor for ar1()
  mutate(time = factor(trial)) |> 
  print(n = 6)
# A tibble: 1,800 × 6
  subject icontrol trial task                 performance time 
    <dbl>    <dbl> <dbl> <chr>                      <dbl> <fct>
1       1  -0.0849     1 Perceptual Reasoning        18.0 1    
2       1  -0.0849     2 Processing Speed            14.4 2    
3       1  -0.0849     3 Working Memory              22.4 3    
4       1  -0.0849     4 Working Memory              17.2 4    
5       1  -0.0849     5 Working Memory              21.7 5    
6       1  -0.0849     6 Perceptual Reasoning        14.8 6    
# ℹ 1,794 more rows

Fitting the AR(1) Model

In glmmTMB, we handle this by simply adding an ar1() term to our random effects. The software will automatically estimate \(\sigma_v\) and \(\rho\) for us.

fit_ar1 <- glmmTMB(
  formula = performance ~ 
    1 + task * icontrol +    # fixed effects
    (1 + task | subject) +   # random effects
    ar1(0 + time | subject), # ar1 by time (no intercept)
  data = intell
)

Comparing Fixed Effects

compare_parameters(fit1, fit_ar1, effects = "fixed", select = "{estimate} ({se})")
Parameter                          |        fit1 |     fit_ar1
--------------------------------------------------------------
(Intercept)                        | 7.71 (0.53) | 7.68 (0.52)
task [Processing Speed]            | 2.92 (0.41) | 2.91 (0.40)
task [Verbal Reasoning]            | 2.61 (0.31) | 2.66 (0.30)
task [Working Memory]              | 7.81 (0.38) | 7.87 (0.36)
icontrol                           | 4.89 (0.54) | 4.86 (0.54)
task [Processing Speed] × icontrol | 0.11 (0.42) | 0.22 (0.41)
task [Verbal Reasoning] × icontrol | 0.10 (0.32) | 0.09 (0.31)
task [Working Memory] × icontrol   | 2.07 (0.38) | 2.16 (0.37)
--------------------------------------------------------------
Observations                       |        1800 |        1800

Our estimates have shifted. Accounting for autocorrelation re-weights the data, treating correlated adjacent trials as partially redundant.

Did the model improve?

test_lrt(fit1, fit_ar1)
# Likelihood-Ratio-Test (LRT) for Model Comparison (ML-estimator)

Name    |   Model | df | df_diff |   Chi2 |      p
--------------------------------------------------
fit1    | glmmTMB | 19 |         |        |       
fit_ar1 | glmmTMB | 21 |       2 | 205.58 | < .001

Yes! Adding the two parameters for the Autocorrelated Error (the amount and the strength) resulted in a significant improvement to model fit

AR1 Parameter Estimates

VarCorr(fit_ar1)

Conditional model:
 Groups    Name                 Std.Dev. Corr                      
 subject   (Intercept)          6.0856                             
           taskProcessing Speed 4.4658   -0.205                    
           taskVerbal Reasoning 3.0375   -0.569       0.625        
           taskWorking Memory   3.9521   -0.337       0.627  0.703 
 subject.1 time1                2.8056   0.606 (ar1)               
 Residual                       1.3276                             

Look to the subject.1 time1 row:

  • \(\color{#4daf4a}{\sigma_v}=2.8056\): The amount of autocorrelated error.
  • \(\color{#4daf4a}{\rho}=0.606\): The strength of the autocorrelation.

A Warning About AR(1)

Discrete AR(1) models make a key assumption: Time steps are integers.

  • ar1() can handle missing observations (e.g., Trial 1, 2, and 4) if we carefully format the factor levels to explicitly include the missing Trial 3.
  • But what happens in the real world when time isn’t a simple integer sequence?
    • A subject completes Survey 1 at 9:00 AM.
    • They complete Survey 2 at 10:30 AM (1.5 hours later).
    • They complete Survey 3 at 6:00 PM (7.5 hours later).

The Continuous Time Problem

  • Because ar1() requires a discrete factor, we can’t easily pass it continuous or fractional time gaps (1.5 vs. 7.5 hours).
  • If we just force those surveys into consecutive levels (1, 2, and 3), the AR(1) model blindly treats the 1.5-hour jump exactly the same as the 7.5-hour jump.
  • Next Lecture: We will dive into Longitudinal Data and learn how to use continuous-time autocorrelation structures (like Spatial Exponential) to handle real-world fractional time!