School Attainment Multilevel Model Results

Multilevel analysis of KS4 Attainment 8 scores across English secondary schools

Author

Adam Dennett with some assistance from Claude Code

Published

April 21, 2026

1 Data Overview

For full details on data sources, panel construction, variable definitions, descriptive statistics, distributions, and bivariate relationships, see the Data Overview report.

Model code & case study

The full lme4 code for all model specifications (Analyses A–E) is available in the Model Experiments report, which provides human-readable, self-contained implementations for manual inspection and reproducibility. A detailed application of the model to a single local authority — including maps, non-linear effect interpretation, and policy recommendations — is presented in the Brighton & Hove Case Study.

Sample size note

The full 9-predictor models (Analyses A, B, and E) are fitted on approximately 12,200 school-year observations from ~13,000 mainstream (Academy + Maintained) school-year records. The main source of attrition is the average_number_of_days_taken (teacher sickness absence) variable, which drops ~570 observations — predominantly in 2021-22 and 2022-23 where the workforce census sickness data was patchier (likely due to COVID-era reporting disruption). Around 420 of these are NA values and ~150 are zeros excluded by the > 0 requirement for log transformation. Academies account for ~80% of the losses. See the dropped-rows diagnostic in the Model Experiments report for a full breakdown.

2 Measuring Model Fit: Marginal and Conditional R²

Throughout this report we summarise the goodness-of-fit of each mixed-effects model using two quantities: marginal R² (\(R^2_m\)) and conditional R² (\(R^2_c\)). These are not the default fit statistics most readers will encounter in linear mixed-effects modelling, so this short section explains what they are, how they are calculated, and how they relate to the measures that are more commonly reported (log-likelihood, AIC, BIC, ICC).

2.1 Why not just use the ordinary R²?

In ordinary least-squares regression, \(R^2\) is unambiguous: it is the proportion of variance in the outcome explained by the fitted model, \(1 - \mathrm{SS}_{\text{resid}}/\mathrm{SS}_{\text{total}}\). The intuition — “the model explains \(X\%\) of the variation” — is easy to communicate to a non-technical audience, which is why it remains so widely used.

In a linear mixed-effects model (LMM), this simple decomposition breaks down. The total variance in the outcome is no longer split neatly between “explained by the model” and “residual”: it is partitioned into a fixed-effects component, one or more random-effects components (here: year, Ofsted rating, region, and local authority), and a residual. There is no single residual sum of squares to compare against, and naïvely plugging LMM predictions into the OLS formula gives a number that can misbehave (e.g. exceed 1, or decrease when predictors are added). For this reason, LMM output from lme4 reports likelihood-based fit measures — log-likelihood, AIC, BIC, deviance, REML criterion — rather than an \(R^2\). These are excellent for comparing nested or non-nested models, but they are not on the “percent of variance explained” scale and therefore do not translate easily for readers accustomed to OLS.

2.2 The Nakagawa & Schielzeth decomposition

Nakagawa and Schielzeth (2013), extended by Nakagawa, Johnson and Schielzeth (2017), provide a variance-partitioning definition of \(R^2\) that does generalise to mixed models. The total variance is decomposed as:

\[ \sigma^2_{\text{total}} \;=\; \underbrace{\sigma^2_f}_{\text{fixed effects}} \;+\; \underbrace{\textstyle\sum_{k} \sigma^2_{u_k}}_{\text{random effects}} \;+\; \underbrace{\sigma^2_{\varepsilon}}_{\text{residual}} \]

where \(\sigma^2_f = \mathrm{var}(\mathbf{X}\boldsymbol{\beta})\) is the variance of the linear predictor from the fixed effects alone, \(\sigma^2_{u_k}\) is the variance component estimated for each random intercept (here: year, Ofsted, region, LA-within-region), and \(\sigma^2_{\varepsilon}\) is the residual variance. From this, two \(R^2\) quantities follow naturally:

\[ R^2_m \;=\; \frac{\sigma^2_f}{\sigma^2_f + \sum_k \sigma^2_{u_k} + \sigma^2_{\varepsilon}} \qquad\qquad R^2_c \;=\; \frac{\sigma^2_f + \sum_k \sigma^2_{u_k}}{\sigma^2_f + \sum_k \sigma^2_{u_k} + \sigma^2_{\varepsilon}} \]

Marginal R² (\(R^2_m\)) is the share of total variance explained by the fixed effects only — i.e. by the predictors (Ofsted rating, FSM, EAL, teacher sickness, etc.). It answers: “How much of the between-school variation can I attribute to the measured characteristics of the school?”
Conditional R² (\(R^2_c\)) is the share of total variance explained by the fixed and random effects together — i.e. by the predictors plus the structural grouping (year, Ofsted, region, LA). It answers: “How much of the variation can I account for once I also allow systematic differences between years, regions, and local authorities?”

By construction, \(R^2_c \geq R^2_m\), and the difference \(R^2_c - R^2_m\) is the share of variance absorbed by the random effects — i.e. by the grouping structure itself. This is conceptually very close to the intraclass correlation coefficient (ICC), which is a common LMM summary: in a model with a single random intercept and no fixed effects, \(R^2_c - R^2_m\) reduces exactly to the ICC.

All \(R^2\) values reported in this document are computed with performance::r2(), which implements the Nakagawa & Schielzeth definitions above.

2.2.1 Isn’t this just the squared correlation between observed and fitted values?

A reasonable first instinct — and the one that gives the “right answer” in OLS — is to simply produce fitted values \(\hat{y}\) from the model and compute \(\mathrm{cor}(y, \hat{y})^2\). In an ordinary linear regression this reproduces \(R^2\) exactly. In a linear mixed-effects model it approximates the Nakagawa quantities but is not the same thing, and the differences are worth being explicit about:

Which fitted values? An LMM has two natural flavours of prediction. The marginal prediction uses fixed effects only, \(\hat{y}_{\text{marg}} = \mathbf{X}\hat{\boldsymbol{\beta}}\); the conditional prediction adds the random-effect contributions (the BLUPs), \(\hat{y}_{\text{cond}} = \mathbf{X}\hat{\boldsymbol{\beta}} + \mathbf{Z}\hat{\mathbf{u}}\). Squaring the correlation with observed \(y\) therefore gives you two different numbers depending on which you use — there is no single “fitted value” the way there is in OLS. \(R^2_m\) and \(R^2_c\) correspond (roughly) to these two cases.
Variance components vs. sample variances. The Nakagawa definition uses the model-estimated variance components (\(\sigma^2_f\), \(\sigma^2_{u_k}\), \(\sigma^2_{\varepsilon}\)) in the denominator, whereas \(\mathrm{cor}(y, \hat{y})^2\) uses the empirical variance of the observed \(y\). If the model is well-specified these are similar, but they will not be identical in finite samples.
Shrinkage bites the conditional version. The random-effect predictions \(\hat{\mathbf{u}}\) returned by lme4 are BLUPs — best linear unbiased predictors, which are deliberately shrunk toward zero for groups with little data. As a result, \(\mathrm{var}(\mathbf{Z}\hat{\mathbf{u}})\) is systematically smaller than the estimated random-effect variance \(\sum_k \sigma^2_{u_k}\). Computing \(\mathrm{cor}(y, \hat{y}_{\text{cond}})^2\) therefore tends to under-state the full explanatory reach of the model relative to \(R^2_c\). Nakagawa’s \(R^2_c\) sidesteps this by using the raw variance components directly.
For the marginal version, the two are very close. \(\mathrm{cor}(y, \mathbf{X}\hat{\boldsymbol{\beta}})^2\) and \(R^2_m\) will usually agree to within rounding when the model is well-specified and the sample is large, because no shrinkage issue arises for the fixed-effects prediction.

So the squared correlation is a perfectly sensible back-of-the-envelope sanity check — and you will see it used in some of the observed-vs-predicted plots later in this report — but it is not a substitute for the variance-partitioning \(R^2_m\) / \(R^2_c\), which have a clean, theoretically grounded decomposition and behave predictably as predictors and random effects are added to the model.

2.3 How this relates to the fit measures usually reported for LMMs

Measure	What it tells you	Best used for	Limitation
Log-likelihood / REML criterion	Fit of the model to the data, on the likelihood scale	Formal tests (likelihood-ratio tests for nested models)	Not on a “% explained” scale; scale-dependent
AIC / BIC	Likelihood penalised for model complexity	Comparing candidate models (lower = better)	Relative only — no absolute interpretation
Residual variance (\(\sigma^2_{\varepsilon}\))	Unexplained variance after fixed + random effects	Diagnostics; combined with random variances to form ICC / \(R^2\)	Not standardised
ICC	Share of residual variance attributable to a grouping level	Quantifying clustering (e.g. how school-like are schools within an LA?)	Defined per random effect; no fixed-effects contribution
Marginal R² (\(R^2_m\))	Variance explained by fixed effects only	Communicating the predictive value of the measured variables	Ignores the grouping structure
Conditional R² (\(R^2_c\))	Variance explained by fixed + random effects	Communicating the full explanatory reach of the model	Includes structural variation the analyst did not directly measure

In practice, AIC/BIC and likelihood-ratio tests are what one uses to choose between model specifications; \(R^2_m\) and \(R^2_c\) are what one uses to communicate how much of the outcome the chosen model accounts for. The two roles are complementary rather than competing, and both are reported throughout the fit tables that follow.

2.4 How to read the numbers in this report

When inspecting a fit table for any of the models below, the two most useful quick reads are:

\(R^2_m\) — a large marginal R² means the measured school-level predictors (FSM, EAL, Ofsted band, teacher sickness, etc.) themselves account for a lot of the variation in attainment. If \(R^2_m\) is small, the fixed-effect story is weak even if overall fit looks good.
\(R^2_c - R^2_m\) — the gap between conditional and marginal R² tells you how much additional variance is soaked up by the random-effects structure (year, Ofsted, region, LA). A large gap indicates that where and when a school sits matters substantially over and above its measured characteristics; a small gap indicates the fixed effects are doing most of the work.

3 Panel Models (Analysis A)

The panel models pool all four academic years, with year_label entering as a random intercept alongside Ofsted rating and local authority (nested within region). This specification treats year-to-year variation as a variance component rather than a fixed trend.

\[ \log(y_{ij}) = \mathbf{X}_{ij}\boldsymbol{\beta} + u_{\text{year}} + u_{\text{Ofsted}} + u_{\text{region}} + u_{\text{LA}} + \varepsilon_{ij} \]

3.1 Fixed Effects

Show code

# Build combined table
panel_fe <- map_dfr(names(panel_models), function(mn) {
  m <- panel_models[[mn]]
  fe <- as.data.frame(summary(m)$coefficients)
  fe$term <- rownames(fe)

  fe %>%
    transmute(
      Model = outcome_labels[mn],
      Term = sapply(term, function(t) {
        raw <- gsub("^log\\((.+)\\)$", "\\1", t)
        lbl <- var_label(raw)
        if (grepl("^log\\(", t)) lbl <- paste0("log(", lbl, ")")
        lbl
      }),
      Estimate = round(Estimate, 4),
      `Std. Error` = round(`Std. Error`, 4),
      `t value` = round(`t value`, 2),
      `p value` = ifelse(`Pr(>|t|)` < 0.001, "< 0.001",
                         ifelse(`Pr(>|t|)` < 0.01, "< 0.01",
                                ifelse(`Pr(>|t|)` < 0.05, "< 0.05",
                                       round(`Pr(>|t|)`, 3))))
    )
})

panel_fe %>%
  kable(align = "llrrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11) %>%
  collapse_rows(columns = 1, valign = "top")

Table 1: Fixed effect estimates from panel models (all years pooled, year as random effect)

	Model	Term	Estimate	Std. Error	t value	p value
(Intercept)...1	All Pupils	(Intercept)	4.6314	0.0412	112.35	< 0.001
	All Pupils	log(% KS4 pupils who are disadvantaged)	-0.0637	0.0029	-21.72	< 0.001
	All Pupils	log(% overall absence)	-0.2099	0.0052	-40.34	< 0.001
	All Pupils	log(% pupils with English not first language)	0.0062	0.0013	4.83	< 0.001
	All Pupils	% KS4 pupils with low prior attainment (KS2)	-0.0061	0.0002	-40.46	< 0.001
	All Pupils	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	0.0010	0.0078	0.13	0.896
	All Pupils	Admissions: Selective (ref: non-sel. in highly sel. area)	0.1070	0.0073	14.68	< 0.001
	All Pupils	Gorard segregation index	-0.0178	0.0491	-0.36	0.717
	All Pupils	Teachers remaining in same school (FTE)	0.0005	0.0001	9.19	< 0.001
	All Pupils	% teachers on leadership pay range	-0.0014	0.0002	-6.12	< 0.001
	All Pupils	log(Avg. teacher sickness days)	-0.0139	0.0026	-5.39	< 0.001
(Intercept)...12	Disadvantaged	(Intercept)	4.3596	0.0516	84.48	< 0.001
	Disadvantaged	log(% KS4 pupils who are disadvantaged)	0.0108	0.0042	2.60	< 0.01
	Disadvantaged	log(% overall absence)	-0.2961	0.0069	-43.05	< 0.001
	Disadvantaged	log(% pupils with English not first language)	0.0232	0.0017	13.67	< 0.001
	Disadvantaged	% KS4 pupils with low prior attainment (KS2)	-0.0059	0.0002	-28.91	< 0.001
	Disadvantaged	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	-0.0194	0.0106	-1.84	0.067
	Disadvantaged	Admissions: Selective (ref: non-sel. in highly sel. area)	0.2789	0.0097	28.61	< 0.001
	Disadvantaged	Gorard segregation index	0.0306	0.0648	0.47	0.637
	Disadvantaged	Teachers remaining in same school (FTE)	0.0000	0.0001	-0.11	0.915
	Disadvantaged	% teachers on leadership pay range	-0.0010	0.0003	-3.40	< 0.001
	Disadvantaged	log(Avg. teacher sickness days)	-0.0182	0.0034	-5.33	< 0.001
(Intercept)...23	Non-Disadvantaged	(Intercept)	4.5241	0.0370	122.27	< 0.001
	Non-Disadvantaged	log(% KS4 pupils who are disadvantaged)	-0.0369	0.0027	-13.82	< 0.001
	Non-Disadvantaged	log(% overall absence)	-0.1685	0.0044	-38.61	< 0.001
	Non-Disadvantaged	log(% pupils with English not first language)	0.0078	0.0011	7.20	< 0.001
	Non-Disadvantaged	% KS4 pupils with low prior attainment (KS2)	-0.0056	0.0001	-43.71	< 0.001
	Non-Disadvantaged	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	0.0012	0.0071	0.16	0.87
	Non-Disadvantaged	Admissions: Selective (ref: non-sel. in highly sel. area)	0.1183	0.0062	19.02	< 0.001
	Non-Disadvantaged	Gorard segregation index	-0.0485	0.0456	-1.06	0.288
	Non-Disadvantaged	Teachers remaining in same school (FTE)	0.0005	0.0000	10.38	< 0.001
	Non-Disadvantaged	% teachers on leadership pay range	-0.0014	0.0002	-7.56	< 0.001
	Non-Disadvantaged	log(Avg. teacher sickness days)	-0.0165	0.0022	-7.64	< 0.001

3.2 Coefficient Comparison Plot

Show code

panel_coef_data <- map_dfr(names(panel_models), function(mn) {
  m <- panel_models[[mn]]
  fe <- as.data.frame(summary(m)$coefficients)
  fe$term <- rownames(fe)

  fe %>%
    filter(term != "(Intercept)") %>%
    transmute(
      model = outcome_labels[mn],
      term_raw = term,
      term_label = sapply(term, function(t) {
        raw <- gsub("^log\\((.+)\\)$", "\\1", t)
        lbl <- var_label(raw)
        if (grepl("^log\\(", t)) lbl <- paste0("log(", lbl, ")")
        lbl
      }),
      estimate = Estimate,
      se = `Std. Error`,
      ci_lo = Estimate - 1.96 * `Std. Error`,
      ci_hi = Estimate + 1.96 * `Std. Error`
    )
})

ggplot(panel_coef_data,
       aes(x = estimate, y = reorder(term_label, estimate),
           colour = model)) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey50") +
  geom_pointrange(aes(xmin = ci_lo, xmax = ci_hi),
                  position = position_dodge(width = 0.6), size = 0.4) +
  scale_colour_manual(values = c("All Pupils" = "#2e6260",
                                 "Disadvantaged" = "#4e3c56",
                                 "Non-Disadvantaged" = "#abc766"),
                      name = NULL) +
  labs(x = "Estimate (log scale)", y = NULL,
       subtitle = "Points = estimate; bars = 95% CI") +
  theme_pub

Figure 1: Fixed effect estimates with 95% confidence intervals (panel models)

3.3 Random Effects & Model Fit

The multilevel model includes four levels of random intercepts, each allowing the baseline log(ATT8) to vary across groups:

Academic year (year_label) — captures system-wide year-on-year shifts in attainment (e.g. grade inflation, COVID recovery effects) that affect all schools equally within a given year.
Ofsted rating (OFSTEDRATING_1, harmonised to four levels) — allows schools with different inspection outcomes to have systematically different attainment baselines, beyond what the fixed-effect predictors explain.
Government Office Region (gor_name) — accounts for broad regional differences in attainment (e.g. London vs. the North East), which may reflect regional labour markets, funding patterns, or cultural factors.
Local authority nested within region (LANAME:gor_name) — the finest grouping level, capturing LA-specific effects such as local policy environments, demographic composition, or authority-level support services, over and above the regional baseline.

The table below reports the estimated variance for each random effect, along with the residual variance and overall model fit (marginal R², which reflects fixed effects only, and conditional R², which includes both fixed and random effects).

Show code

panel_fit <- map_dfr(names(panel_models), function(mn) {
  m <- panel_models[[mn]]
  d <- panel_diag[[mn]]
  vc <- as.data.frame(VarCorr(m))

  # Extract variance for each grouping
  var_year   <- vc$vcov[vc$grp == "year_label"]
  var_ofsted <- vc$vcov[vc$grp == "OFSTEDRATING_1"]
  var_la     <- vc$vcov[vc$grp == "LANAME:gor_name"]
  var_region <- vc$vcov[vc$grp == "gor_name"]
  var_resid  <- d$sigma^2

  tibble(
    Model = outcome_labels[mn],
    var_year   = round(var_year, 6),
    var_ofsted = round(var_ofsted, 6),
    var_la     = round(var_la, 6),
    var_region = round(var_region, 6),
    var_resid  = round(var_resid, 6),
    r2m = round(d$r2_marginal, 4),
    r2c = round(d$r2_conditional, 4),
    N = d$n_obs
  )
})

names(panel_fit)[2:8] <- c(
  paste0("\u03c3\u00b2", "<sub>year</sub>"),
  paste0("\u03c3\u00b2", "<sub>Ofsted</sub>"),
  paste0("\u03c3\u00b2", "<sub>LA</sub>"),
  paste0("\u03c3\u00b2", "<sub>region</sub>"),
  paste0("\u03c3\u00b2", "<sub>resid</sub>"),
  "R\u00b2<sub>m</sub>",
  "R\u00b2<sub>c</sub>"
)

panel_fit %>%
  kable(format = "html", align = "lrrrrrrrrr", escape = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11)

Table 2: Random effect variance components and model fit statistics (panel models)

Model	σ²_year	σ²_Ofsted	σ²_LA	σ²_region	σ²_resid	R²_m	R²_c	N
All Pupils	0.002193	0.002408	0.000806	0.000263	0.007947	0.6409	0.7904	9047
Disadvantaged	0.003634	0.002997	0.001311	0.000764	0.013523	0.5480	0.7251	8936
Non-Disadvantaged	0.001831	0.001885	0.000853	0.000203	0.005408	0.6081	0.7918	8936

3.4 Random Intercepts

Show code

m_all <- panel_models$all
re <- ranef(m_all)

# Year random effects
re_year <- tibble(
  year = rownames(re$year_label),
  intercept = re$year_label[, 1]
)

p_year <- ggplot(re_year, aes(x = reorder(year, intercept), y = intercept)) +
  geom_col(fill = "#2e6260", alpha = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  coord_flip() +
  labs(title = "Year Random Intercepts", x = NULL, y = "Intercept (log scale)") +
  theme_pub

# Ofsted random effects (harmonised)
re_ofsted <- tibble(
  rating = rownames(re$OFSTEDRATING_1),
  intercept = re$OFSTEDRATING_1[, 1]
)

p_ofsted <- ggplot(re_ofsted, aes(x = reorder(rating, -intercept), y = intercept)) +
  geom_col(fill = "#abc766", alpha = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  coord_flip() +
  labs(title = "Ofsted Rating Random Intercepts (harmonised)", x = NULL, y = "Intercept (log scale)") +
  theme_pub

# Region random effects
re_region <- tibble(
  region = rownames(re$gor_name),
  intercept = re$gor_name[, 1]
)

p_region <- ggplot(re_region, aes(x = reorder(region, intercept), y = intercept)) +
  geom_col(fill = "#4e3c56", alpha = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  coord_flip() +
  labs(title = "Region Random Intercepts", x = NULL, y = "Intercept (log scale)") +
  theme_pub

p_year / p_ofsted / p_region

Figure 2: Random intercept estimates by grouping level (All Pupils panel model)

3.5 Null Model: Variance Structure

The violin and box plots below show the distribution of Attainment 8 scores across the random-effect grouping variables. These visualise the between-group variance that the null (intercept-only) multilevel model captures — the same structure shown in more detail in the variance components table above.

Show code

# Fit null model: ATT8 with only Ofsted as random intercept
null_ofsted <- lmer(log(ATT8SCR) ~ 1 + (1 | OFSTEDRATING_1), data = model_data)
grand_mean_ofsted <- exp(fixef(null_ofsted)[1])

# Extract shrunken group means (BLUPs)
re_ofsted_null <- ranef(null_ofsted)$OFSTEDRATING_1 %>%
  as.data.frame() %>%
  rownames_to_column("OFSTEDRATING_1") %>%
  rename(blup = `(Intercept)`) %>%
  mutate(shrunken_mean = exp(fixef(null_ofsted)[1] + blup))

# Raw group means
raw_means_ofsted <- model_data %>%
  filter(!is.na(OFSTEDRATING_1), !is.na(ATT8SCR)) %>%
  group_by(OFSTEDRATING_1) %>%
  summarise(raw_mean = mean(ATT8SCR, na.rm = TRUE), n = n(), .groups = "drop") %>%
  left_join(re_ofsted_null, by = "OFSTEDRATING_1")

model_data %>%
  filter(!is.na(OFSTEDRATING_1), !is.na(ATT8SCR)) %>%
  ggplot(aes(x = OFSTEDRATING_1, y = ATT8SCR, fill = OFSTEDRATING_1)) +
  geom_violin(alpha = 0.3, colour = NA) +
  geom_boxplot(width = 0.15, alpha = 0.7, outlier.size = 0.5, outlier.alpha = 0.3) +
  geom_hline(yintercept = grand_mean_ofsted, linetype = "dashed",
             colour = "black", linewidth = 0.8) +
  geom_point(data = raw_means_ofsted, aes(y = raw_mean),
             shape = 4, size = 3, colour = "#7b132b", stroke = 1.2) +
  geom_point(data = raw_means_ofsted, aes(y = shrunken_mean),
             shape = 18, size = 4, colour = "#2e6260") +
  annotate("text", x = 0.55, y = grand_mean_ofsted + 1.5,
           label = paste0("Grand mean = ", round(grand_mean_ofsted, 1)),
           hjust = 0, size = 3.2, colour = "grey30") +
  labs(title = "ATT8 Distribution by Ofsted Rating (Null Model Structure)",
       subtitle = "Dashed = grand mean | Teal diamond = shrunken mean (BLUP) | Dark red cross = raw mean",
       x = "Ofsted Rating (harmonised)", y = "Attainment 8 Score") +
  theme_pub +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 30, hjust = 1))

Figure 3: Distribution of Attainment 8 scores by Ofsted rating — visualising the null model variance structure

Show code

null_region <- lmer(log(ATT8SCR) ~ 1 + (1 | gor_name), data = model_data)
grand_mean_region <- exp(fixef(null_region)[1])

re_region_null <- ranef(null_region)$gor_name %>%
  as.data.frame() %>%
  rownames_to_column("gor_name") %>%
  rename(blup = `(Intercept)`) %>%
  mutate(shrunken_mean = exp(fixef(null_region)[1] + blup))

raw_means_region <- model_data %>%
  filter(!is.na(gor_name), !is.na(ATT8SCR)) %>%
  group_by(gor_name) %>%
  summarise(raw_mean = mean(ATT8SCR, na.rm = TRUE), n = n(), .groups = "drop") %>%
  left_join(re_region_null, by = "gor_name")

model_data %>%
  filter(!is.na(gor_name), !is.na(ATT8SCR)) %>%
  ggplot(aes(x = reorder(gor_name, ATT8SCR, FUN = median),
             y = ATT8SCR, fill = gor_name)) +
  geom_violin(alpha = 0.3, colour = NA) +
  geom_boxplot(width = 0.15, alpha = 0.7, outlier.size = 0.5, outlier.alpha = 0.3) +
  geom_hline(yintercept = grand_mean_region, linetype = "dashed",
             colour = "black", linewidth = 0.8) +
  geom_point(data = raw_means_region,
             aes(x = reorder(gor_name, raw_mean), y = raw_mean),
             shape = 4, size = 3, colour = "#7b132b", stroke = 1.2) +
  geom_point(data = raw_means_region,
             aes(x = reorder(gor_name, raw_mean), y = shrunken_mean),
             shape = 18, size = 4, colour = "#2e6260") +
  coord_flip() +
  labs(title = "ATT8 Distribution by Region (Null Model Structure)",
       subtitle = "Dashed = grand mean | Teal diamond = shrunken mean (BLUP) | Dark red cross = raw mean",
       x = NULL, y = "Attainment 8 Score") +
  theme_pub +
  theme(legend.position = "none")

Figure 4: Distribution of Attainment 8 scores by Government Office Region — null model variance structure

Show code

model_data %>%
  filter(!is.na(OFSTEDRATING_1), !is.na(ATT8SCR)) %>%
  ggplot(aes(x = OFSTEDRATING_1, y = ATT8SCR, fill = OFSTEDRATING_1)) +
  geom_violin(alpha = 0.3, colour = NA) +
  geom_boxplot(width = 0.15, alpha = 0.7, outlier.size = 0.3, outlier.alpha = 0.2) +
  facet_wrap(~ year_label, ncol = 2) +
  labs(title = "ATT8 by Ofsted Rating — by Academic Year",
       subtitle = "Violin + boxplot shows the distributional shape within each Ofsted category",
       x = "Ofsted Rating (harmonised)", y = "Attainment 8 Score") +
  theme_pub +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "none")

Figure 5: Distribution of Attainment 8 by Ofsted rating, faceted by academic year

3.6 Caterpillar Plot: LA Random Intercepts

The caterpillar plot below displays the estimated random intercepts for each local authority (nested within region), ordered by magnitude. Error bars show 95% confidence intervals (±1.96 posterior SE). Local authorities whose interval does not cross zero are meaningfully different from the overall mean, after accounting for fixed effects and other random effects. Brighton and Hove is highlighted in pink.

Show code

library(ggrepel)

# Extract LA random effects with conditional variances
re_list_imp <- ranef(imputed_models$all, condVar = TRUE)
re_la <- as.data.frame(re_list_imp$`LANAME:gor_name`)
re_la$group <- rownames(re_list_imp$`LANAME:gor_name`)

# Posterior standard errors
se_la <- sqrt(attr(re_list_imp$`LANAME:gor_name`, "postVar")[1, 1, ])
re_la$se <- se_la

re_la <- re_la %>%
  mutate(
    la_name = sub(":.*$", "", group),
    ci_lo = `(Intercept)` - 1.96 * se,
    ci_hi = `(Intercept)` + 1.96 * se,
    sig = ci_lo > 0 | ci_hi < 0,
    highlight = ifelse(grepl("Brighton", la_name), "Brighton and Hove", "Other")
  )

# Top and bottom 10 for labelling (+ always include Brighton)
extreme_la <- c(
  head(re_la$la_name[order(re_la$`(Intercept)`)], 10),
  tail(re_la$la_name[order(re_la$`(Intercept)`)], 10)
)
if (any(grepl("Brighton", re_la$la_name))) {
  extreme_la <- unique(c(extreme_la,
                          re_la$la_name[grepl("Brighton", re_la$la_name)]))
}

ggplot(re_la, aes(x = reorder(group, `(Intercept)`), y = `(Intercept)`)) +
  geom_errorbar(aes(ymin = ci_lo, ymax = ci_hi),
                width = 0, colour = "grey60", linewidth = 0.3) +
  geom_point(aes(colour = highlight), size = 1.5) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  geom_text_repel(
    data = re_la %>% filter(la_name %in% extreme_la),
    aes(label = la_name),
    size = 2.5, max.overlaps = 25, force = 4, nudge_y = 0.005,
    segment.colour = "grey60", segment.size = 0.3
  ) +
  scale_colour_manual(values = c("Other" = "#49a0c4",
                                  "Brighton and Hove" = "#e16fca"),
                      guide = "none") +
  labs(title = "LA Random Intercepts — Imputed Full Panel Model (All Pupils)",
       subtitle = "Ordered by estimate; error bars = 95% CI. LAs whose CI excludes 0 differ meaningfully from average.",
       x = "Local Authority (ordered by random intercept)",
       y = "Random Intercept (log scale)") +
  theme_pub +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

Figure 6: Caterpillar plot of LA random intercepts (imputed full panel model, All Pupils). Top and bottom 10 LAs labelled; Brighton and Hove in pink.

The plots below show the same caterpillar format for disadvantaged and non-disadvantaged Attainment 8 models. The LA ordering will differ between the two — local authorities that perform well for disadvantaged pupils are not necessarily the same ones that perform well for non-disadvantaged pupils.

Show code

plot_la_caterpillar(imputed_models$disadvantaged,
                    "LA Random Intercepts — Imputed Full Panel Model (Disadvantaged Pupils)")

Figure 7: Caterpillar plot of LA random intercepts (imputed full panel model, Disadvantaged Pupils). Top and bottom 10 LAs labelled; Brighton and Hove in pink.

Show code

plot_la_caterpillar(imputed_models$non_disadvantaged,
                    "LA Random Intercepts — Imputed Full Panel Model (Non-Disadvantaged Pupils)")

Figure 8: Caterpillar plot of LA random intercepts (imputed full panel model, Non-Disadvantaged Pupils). Top and bottom 10 LAs labelled; Brighton and Hove in pink.

3.7 Residual Diagnostics

Having examined the random effects structure, we now turn to the model’s residuals to check whether the key assumptions of the multilevel model are met: linearity, homoscedasticity (constant variance), and approximate normality of residuals.

Show code

m_all <- panel_models$all

resid_data <- tibble(
  fitted = fitted(m_all),
  resid  = residuals(m_all, type = "pearson")
) %>%
  filter(!is.na(fitted), !is.na(resid)) %>%
  mutate(resid_std = resid / sd(resid))

p1 <- ggplot(resid_data, aes(x = fitted, y = resid)) +
  geom_point(alpha = 0.1, size = 0.5, colour = "#2e6260") +
  geom_hline(yintercept = 0, colour = "#7b132b", linetype = "dashed") +
  geom_smooth(method = "loess", se = FALSE, colour = "#7b132b", linewidth = 0.7) +
  labs(title = "Residuals vs Fitted", x = "Fitted values (log scale)", y = "Pearson residuals") +
  theme_pub

p2 <- ggplot(resid_data, aes(sample = resid_std)) +
  stat_qq(alpha = 0.1, size = 0.5, colour = "#2e6260") +
  stat_qq_line(colour = "#7b132b", linewidth = 0.7) +
  labs(title = "Normal Q-Q", x = "Theoretical quantiles", y = "Standardised residuals") +
  theme_pub

p3 <- ggplot(resid_data, aes(x = resid)) +
  geom_histogram(bins = 80, fill = "#2e6260", alpha = 0.7, colour = "white") +
  geom_vline(xintercept = 0, colour = "#7b132b", linetype = "dashed") +
  labs(title = "Residual Distribution", x = "Pearson residuals", y = "Frequency") +
  theme_pub

p4 <- ggplot(resid_data, aes(x = fitted, y = sqrt(abs(resid_std)))) +
  geom_point(alpha = 0.1, size = 0.5, colour = "#2e6260") +
  geom_smooth(method = "loess", se = FALSE, colour = "#7b132b", linewidth = 0.7) +
  labs(title = "Scale-Location", x = "Fitted values (log scale)",
       y = expression(sqrt("|Standardised residuals|"))) +
  theme_pub

(p1 + p2) / (p3 + p4) +
  plot_annotation(
    title = "Panel Model Diagnostics: All Pupils",
    subtitle = "log(ATT8SCR) ~ fixed effects + (1|year) + (1|Ofsted harmonised) + (1|region/LA)"
  )

Figure 9: Residual diagnostics for the All Pupils panel model

4 Per-Year Models (Analysis B)

Separate models are fitted for each academic year with the same fixed-effect specification but without a year term. This allows all coefficients to vary freely across years, revealing whether relationships between predictors and attainment are stable or shifting over time.

4.1 Model Fit Comparison Across Years

Show code

yearly_fit <- map_dfr(names(yearly_diag), function(yr) {
  map_dfr(names(yearly_diag[[yr]]), function(mn) {
    d <- yearly_diag[[yr]][[mn]]
    tibble(
      Year = yr,
      Outcome = outcome_labels[mn],
      N = ifelse(is.na(d$r2_marginal), NA_integer_, d$n_obs),
      r2m = round(d$r2_marginal, 4),
      r2c = round(d$r2_conditional, 4),
      sigma = round(d$sigma, 4)
    )
  })
})

names(yearly_fit)[4:6] <- c(
  "R\u00b2<sub>m</sub>",
  "R\u00b2<sub>c</sub>",
  "\u03c3"
)

yearly_fit %>%
  kable(format = "html", align = "llrrrr", escape = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11) %>%
  collapse_rows(columns = 1, valign = "top")

Table 3: Model fit statistics by year and outcome

Year	Outcome	N	R²_m	R²_c	σ
2021-22	All Pupils	2943	0.6728	0.7822	0.0841
	Disadvantaged	2905	0.5695	0.6732	0.1164
	Non-Disadvantaged	2905	0.6238	0.7441	0.0773
2022-23	All Pupils	2961	0.7301	0.8402	0.0738
	Disadvantaged	2921	0.5848	0.7015	0.1163
	Non-Disadvantaged	2921	0.6476	0.7856	0.0721
2023-24	All Pupils	3143	0.6423	0.7162	0.1060
	Disadvantaged	3110	0.6299	0.7271	0.1166
	Non-Disadvantaged	3110	0.6820	0.7872	0.0718

4.2 Coefficient Stability Across Years

Show code

yearly_coef_data <- map_dfr(names(yearly_models), function(yr) {
  m <- yearly_models[[yr]]$all
  if (is.null(m)) return(NULL)

  fe <- as.data.frame(summary(m)$coefficients)
  fe$term <- rownames(fe)

  fe %>%
    filter(term != "(Intercept)") %>%
    transmute(
      year = yr,
      term_raw = term,
      term_label = sapply(term, function(t) {
        raw <- gsub("^log\\((.+)\\)$", "\\1", t)
        lbl <- var_label(raw)
        if (grepl("^log\\(", t)) lbl <- paste0("log(", lbl, ")")
        lbl
      }),
      estimate = Estimate,
      se = `Std. Error`,
      ci_lo = Estimate - 1.96 * `Std. Error`,
      ci_hi = Estimate + 1.96 * `Std. Error`
    )
})

if (nrow(yearly_coef_data) > 0) {
  ggplot(yearly_coef_data,
         aes(x = year, y = estimate, group = 1)) +
    geom_hline(yintercept = 0, linetype = "dashed", colour = "grey60") +
    geom_ribbon(aes(ymin = ci_lo, ymax = ci_hi), alpha = 0.15, fill = "#2e6260") +
    geom_line(colour = "#2e6260", linewidth = 0.7) +
    geom_point(colour = "#2e6260", size = 2) +
    facet_wrap(~ term_label, scales = "free_y", ncol = 3) +
    labs(x = "Academic Year", y = "Estimate (log scale)",
         subtitle = "Shaded band = 95% CI") +
    theme_pub +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8))
}

Figure 10: How fixed-effect estimates change across academic years (All Pupils models)

4.3 Side-by-Side Coefficient Table (Per-Year, All Pupils)

Show code

yearly_coef_wide <- yearly_coef_data %>%
  mutate(
    # Look up p-values from the original models
    sig = map2_chr(year, term_raw, function(yr, term) {
      m <- yearly_models[[yr]]$all
      if (is.null(m)) return("")
      fe <- as.data.frame(summary(m)$coefficients)
      p <- fe[term, "Pr(>|t|)"]
      if (is.na(p)) "" else if (p < 0.001) "***" else if (p < 0.01) "**" else if (p < 0.05) "*" else ""
    }),
    cell = paste0(round(estimate, 4), sig)
  ) %>%
  select(term_label, year, cell) %>%
  pivot_wider(names_from = year, values_from = cell)

yearly_coef_wide %>%
  rename(Variable = term_label) %>%
  kable(align = c("l", rep("r", ncol(yearly_coef_wide) - 1))) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11)

Table 4: Fixed effect estimates by year (All Pupils outcome). Significance: *** p<0.001, ** p<0.01, * p<0.05

Variable	2021-22	2022-23	2023-24
log(% KS4 pupils who are disadvantaged)	-0.056***	-0.0703***	-0.0622***
log(% overall absence)	-0.1956***	-0.215***	-0.2118***
log(% pupils with English not first language)	0.0046*	0.0118***	0.0082***
% KS4 pupils with low prior attainment (KS2)	-0.0059***	-0.0062***	-0.0068***
Admissions: Other non-selective (ref: non-sel. in highly sel. area)	0.0181	-0.0027	0.0179
Admissions: Selective (ref: non-sel. in highly sel. area)	0.122***	0.087***	0.1116***
Gorard segregation index	0.0074	-0.0907	-0.0546
Teachers remaining in same school (FTE)	7e-04***	3e-04***	4e-04***
% teachers on leadership pay range	-0.0019***	-0.0015***	-0.001*
log(Avg. teacher sickness days)	-0.0065	-0.0192***	-0.0184***

5 Core Models (All 4 Years)

School Workforce Census variables (remained_in_the_same_school, teachers_on_leadership_pay_range_percent, average_number_of_days_taken) and KS2 prior attainment (PTPRIORLO) are not yet available for 2024-25. To enable cross-year comparison including the latest year, a core specification is fitted using only the 5 predictors available across all four years:

\[ \log(y_{ij}) = \beta_0 + \beta_1\log(\text{FSM\%}) + \beta_2\log(\text{Absence\%}) + \beta_3\log(\text{EAL\%}) + \beta_4\text{Admissions} + \beta_5\text{Gorard} + u_{\text{year}} + u_{\text{Ofsted}} + u_{\text{region}} + u_{\text{LA}} + \varepsilon_{ij} \]

5.1 Core Panel: Fixed Effects

Show code

core_panel_fe <- map_dfr(names(core_models), function(mn) {
  m <- core_models[[mn]]
  fe <- as.data.frame(summary(m)$coefficients)
  fe$term <- rownames(fe)

  fe %>%
    transmute(
      Model = outcome_labels[mn],
      Term = sapply(term, function(t) {
        raw <- gsub("^log\\((.+)\\)$", "\\1", t)
        lbl <- var_label(raw)
        if (grepl("^log\\(", t)) lbl <- paste0("log(", lbl, ")")
        lbl
      }),
      Estimate = round(Estimate, 4),
      `Std. Error` = round(`Std. Error`, 4),
      `t value` = round(`t value`, 2),
      `p value` = ifelse(`Pr(>|t|)` < 0.001, "< 0.001",
                         ifelse(`Pr(>|t|)` < 0.01, "< 0.01",
                                ifelse(`Pr(>|t|)` < 0.05, "< 0.05",
                                       round(`Pr(>|t|)`, 3))))
    )
})

core_panel_fe %>%
  kable(align = "llrrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11) %>%
  collapse_rows(columns = 1, valign = "top")

Table 5: Fixed effect estimates from core panel models (all 4 years, reduced specification)

	Model	Term	Estimate	Std. Error	t value	p value
(Intercept)...1	All Pupils	(Intercept)	4.7841	0.0374	127.90	< 0.001
	All Pupils	log(% KS4 pupils who are disadvantaged)	-0.1280	0.0026	-50.05	< 0.001
	All Pupils	log(% overall absence)	-0.2614	0.0047	-55.26	< 0.001
	All Pupils	log(% pupils with English not first language)	0.0002	0.0013	0.16	0.873
	All Pupils	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	0.0123	0.0081	1.52	0.129
	All Pupils	Admissions: Selective (ref: non-sel. in highly sel. area)	0.1614	0.0068	23.68	< 0.001
	All Pupils	Gorard segregation index	-0.0673	0.0544	-1.24	0.217
(Intercept)...8	Disadvantaged	(Intercept)	4.4616	0.0458	97.49	< 0.001
	Disadvantaged	log(% KS4 pupils who are disadvantaged)	-0.0482	0.0033	-14.80	< 0.001
	Disadvantaged	log(% overall absence)	-0.3418	0.0058	-59.03	< 0.001
	Disadvantaged	log(% pupils with English not first language)	0.0178	0.0015	11.73	< 0.001
	Disadvantaged	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	-0.0120	0.0103	-1.17	0.244
	Disadvantaged	Admissions: Selective (ref: non-sel. in highly sel. area)	0.3402	0.0084	40.59	< 0.001
	Disadvantaged	Gorard segregation index	-0.0592	0.0669	-0.89	0.376
(Intercept)...15	Non-Disadvantaged	(Intercept)	4.6883	0.0330	141.86	< 0.001
	Non-Disadvantaged	log(% KS4 pupils who are disadvantaged)	-0.1054	0.0022	-47.17	< 0.001
	Non-Disadvantaged	log(% overall absence)	-0.2107	0.0040	-53.28	< 0.001
	Non-Disadvantaged	log(% pupils with English not first language)	0.0033	0.0010	3.16	< 0.01
	Non-Disadvantaged	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	0.0010	0.0074	0.13	0.894
	Non-Disadvantaged	Admissions: Selective (ref: non-sel. in highly sel. area)	0.1683	0.0058	29.19	< 0.001
	Non-Disadvantaged	Gorard segregation index	-0.1127	0.0503	-2.24	< 0.05

5.2 Core Panel: Random Effects & Model Fit

Show code

core_panel_fit <- map_dfr(names(core_models), function(mn) {
  m <- core_models[[mn]]
  d <- core_diag[[mn]]
  vc <- as.data.frame(VarCorr(m))

  var_year   <- vc$vcov[vc$grp == "year_label"]
  var_ofsted <- vc$vcov[vc$grp == "OFSTEDRATING_1"]
  var_la     <- vc$vcov[vc$grp == "LANAME:gor_name"]
  var_region <- vc$vcov[vc$grp == "gor_name"]
  var_resid  <- d$sigma^2

  tibble(
    Model = outcome_labels[mn],
    var_year   = round(var_year, 6),
    var_ofsted = round(var_ofsted, 6),
    var_la     = round(var_la, 6),
    var_region = round(var_region, 6),
    var_resid  = round(var_resid, 6),
    r2m = round(d$r2_marginal, 4),
    r2c = round(d$r2_conditional, 4),
    N = d$n_obs
  )
})

names(core_panel_fit)[2:8] <- c(
  paste0("\u03c3\u00b2", "<sub>year</sub>"),
  paste0("\u03c3\u00b2", "<sub>Ofsted</sub>"),
  paste0("\u03c3\u00b2", "<sub>LA</sub>"),
  paste0("\u03c3\u00b2", "<sub>region</sub>"),
  paste0("\u03c3\u00b2", "<sub>resid</sub>"),
  "R\u00b2<sub>m</sub>",
  "R\u00b2<sub>c</sub>"
)

core_panel_fit %>%
  kable(format = "html", align = "lrrrrrrrrr", escape = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11)

Table 6: Random effect variance components and model fit statistics (core panel models)

Model	σ²_year	σ²_Ofsted	σ²_LA	σ²_region	σ²_resid	R²_m	R²_c	N
All Pupils	0.001121	0.002906	0.001468	0.001009	0.010372	0.5734	0.7378	12845
Disadvantaged	0.002415	0.003309	0.002183	0.002046	0.014964	0.5038	0.7020	12702
Non-Disadvantaged	0.000800	0.002192	0.001752	0.001117	0.006948	0.5326	0.7464	12702

5.3 Core Per-Year: Model Fit Comparison

Show code

core_yearly_fit <- map_dfr(names(core_yearly_diag), function(yr) {
  map_dfr(names(core_yearly_diag[[yr]]), function(mn) {
    d <- core_yearly_diag[[yr]][[mn]]
    tibble(
      Year = yr,
      Outcome = outcome_labels[mn],
      N = ifelse(is.na(d$r2_marginal), NA_integer_, d$n_obs),
      r2m = round(d$r2_marginal, 4),
      r2c = round(d$r2_conditional, 4),
      sigma = round(d$sigma, 4)
    )
  })
})

names(core_yearly_fit)[4:6] <- c(
  "R\u00b2<sub>m</sub>",
  "R\u00b2<sub>c</sub>",
  "\u03c3"
)

core_yearly_fit %>%
  kable(format = "html", align = "llrrrr", escape = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11) %>%
  collapse_rows(columns = 1, valign = "top")

Table 7: Core model fit statistics by year and outcome (all 4 years)

Year	Outcome	N	R²_m	R²_c	σ
2021-22	All Pupils	3200	0.5821	0.7353	0.0954
	Disadvantaged	3160	0.4940	0.6507	0.1222
	Non-Disadvantaged	3160	0.5061	0.6980	0.0880
2022-23	All Pupils	3225	0.6401	0.7898	0.0871
	Disadvantaged	3183	0.4990	0.6642	0.1254
	Non-Disadvantaged	3183	0.5316	0.7276	0.0845
2023-24	All Pupils	3210	0.5646	0.6799	0.1147
	Disadvantaged	3177	0.5574	0.7020	0.1235
	Non-Disadvantaged	3177	0.5677	0.7388	0.0827
2024-25	All Pupils	3210	0.5725	0.6778	0.1114
	Disadvantaged	3182	0.5804	0.7014	0.1194
	Non-Disadvantaged	3182	0.5824	0.7447	0.0801

5.4 Core Per-Year: Coefficient Stability

Show code

core_yearly_coef_data <- map_dfr(names(core_yearly_models), function(yr) {
  m <- core_yearly_models[[yr]]$all
  if (is.null(m)) return(NULL)

  fe <- as.data.frame(summary(m)$coefficients)
  fe$term <- rownames(fe)

  fe %>%
    filter(term != "(Intercept)") %>%
    transmute(
      year = yr,
      term_raw = term,
      term_label = sapply(term, function(t) {
        raw <- gsub("^log\\((.+)\\)$", "\\1", t)
        lbl <- var_label(raw)
        if (grepl("^log\\(", t)) lbl <- paste0("log(", lbl, ")")
        lbl
      }),
      estimate = Estimate,
      se = `Std. Error`,
      ci_lo = Estimate - 1.96 * `Std. Error`,
      ci_hi = Estimate + 1.96 * `Std. Error`
    )
})

if (nrow(core_yearly_coef_data) > 0) {
  ggplot(core_yearly_coef_data,
         aes(x = year, y = estimate, group = 1)) +
    geom_hline(yintercept = 0, linetype = "dashed", colour = "grey60") +
    geom_ribbon(aes(ymin = ci_lo, ymax = ci_hi), alpha = 0.15, fill = "#e16fca") +
    geom_line(colour = "#e16fca", linewidth = 0.7) +
    geom_point(colour = "#e16fca", size = 2) +
    facet_wrap(~ term_label, scales = "free_y", ncol = 3) +
    labs(x = "Academic Year", y = "Estimate (log scale)",
         subtitle = "Shaded band = 95% CI — core specification (5 predictors)") +
    theme_pub +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8))
}

Figure 11: How core fixed-effect estimates change across all 4 academic years (All Pupils)

5.5 Core Per-Year: Side-by-Side Coefficients

Show code

core_yearly_coef_wide <- core_yearly_coef_data %>%
  mutate(
    sig = map2_chr(year, term_raw, function(yr, term) {
      m <- core_yearly_models[[yr]]$all
      if (is.null(m)) return("")
      fe <- as.data.frame(summary(m)$coefficients)
      p <- fe[term, "Pr(>|t|)"]
      if (is.na(p)) "" else if (p < 0.001) "***" else if (p < 0.01) "**" else if (p < 0.05) "*" else ""
    }),
    cell = paste0(round(estimate, 4), sig)
  ) %>%
  select(term_label, year, cell) %>%
  pivot_wider(names_from = year, values_from = cell)

core_yearly_coef_wide %>%
  rename(Variable = term_label) %>%
  kable(align = c("l", rep("r", ncol(core_yearly_coef_wide) - 1))) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11)

Table 8: Core fixed effect estimates by year (All Pupils). Significance: *** p<0.001, ** p<0.01, * p<0.05

Variable	2021-22	2022-23	2023-24	2024-25
log(% KS4 pupils who are disadvantaged)	-0.1257***	-0.1299***	-0.1226***	-0.1155***
log(% overall absence)	-0.2497***	-0.2671***	-0.2765***	-0.2751***
log(% pupils with English not first language)	-0.0016	0.005*	0.0031	0.0033
Admissions: Other non-selective (ref: non-sel. in highly sel. area)	0.0215	0.0119	0.0268	0.025
Admissions: Selective (ref: non-sel. in highly sel. area)	0.1787***	0.1552***	0.1775***	0.1519***
Gorard segregation index	-0.1133	-0.1521	-0.0972	-0.1542

6 Imputed Full Panel Models (Analysis E)

The imputed full model uses the same 9-predictor specification as Analysis A, but extends coverage to all four academic years by carry-forward imputing the four variables unavailable for 2024-25 (PTPRIORLO, remained_in_the_same_school, teachers_on_leadership_pay_range_percent, average_number_of_days_taken). Each school’s 2023-24 value is used (or the 3-year mean as a fallback). This is the model used in the Shiny app.

\[ \log(y_{ij}) = \mathbf{X}_{ij}\boldsymbol{\beta} + u_{\text{year}} + u_{\text{Ofsted}} + u_{\text{region}} + u_{\text{LA}} + \varepsilon_{ij} \]

6.1 Imputed Panel: Fixed Effects

Show code

imputed_fe <- map_dfr(names(imputed_models), function(mn) {
  m <- imputed_models[[mn]]
  fe <- as.data.frame(summary(m)$coefficients)
  fe$term <- rownames(fe)

  fe %>%
    transmute(
      Model = outcome_labels[mn],
      Term = sapply(term, function(t) {
        raw <- gsub("^log\\((.+)\\)$", "\\1", t)
        lbl <- var_label(raw)
        if (grepl("^log\\(", t)) lbl <- paste0("log(", lbl, ")")
        lbl
      }),
      Estimate = round(Estimate, 4),
      `Std. Error` = round(`Std. Error`, 4),
      `t value` = round(`t value`, 2),
      `p value` = ifelse(`Pr(>|t|)` < 0.001, "< 0.001",
                         ifelse(`Pr(>|t|)` < 0.01, "< 0.01",
                                ifelse(`Pr(>|t|)` < 0.05, "< 0.05",
                                       round(`Pr(>|t|)`, 3))))
    )
})

imputed_fe %>%
  kable(align = "llrrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11) %>%
  collapse_rows(columns = 1, valign = "top")

Table 9: Fixed effect estimates from imputed full panel models (Analysis E: 9 predictors, all 4 years)

	Model	Term	Estimate	Std. Error	t value	p value
(Intercept)...1	All Pupils	(Intercept)	4.6328	0.0369	125.61	< 0.001
	All Pupils	log(% KS4 pupils who are disadvantaged)	-0.0675	0.0027	-24.90	< 0.001
	All Pupils	log(% overall absence)	-0.2132	0.0046	-46.03	< 0.001
	All Pupils	log(% pupils with English not first language)	0.0059	0.0012	4.92	< 0.001
	All Pupils	% KS4 pupils with low prior attainment (KS2)	-0.0058	0.0001	-41.35	< 0.001
	All Pupils	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	0.0006	0.0073	0.08	0.938
	All Pupils	Admissions: Selective (ref: non-sel. in highly sel. area)	0.1079	0.0067	16.20	< 0.001
	All Pupils	Gorard segregation index	-0.0329	0.0474	-0.69	0.488
	All Pupils	Teachers remaining in same school (FTE)	0.0005	0.0000	10.20	< 0.001
	All Pupils	% teachers on leadership pay range	-0.0011	0.0002	-5.39	< 0.001
	All Pupils	log(Avg. teacher sickness days)	-0.0146	0.0024	-6.17	< 0.001
(Intercept)...12	Disadvantaged	(Intercept)	4.3620	0.0465	93.84	< 0.001
	Disadvantaged	log(% KS4 pupils who are disadvantaged)	0.0076	0.0037	2.08	< 0.05
	Disadvantaged	log(% overall absence)	-0.3048	0.0058	-52.15	< 0.001
	Disadvantaged	log(% pupils with English not first language)	0.0232	0.0015	15.52	< 0.001
	Disadvantaged	% KS4 pupils with low prior attainment (KS2)	-0.0053	0.0002	-29.94	< 0.001
	Disadvantaged	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	-0.0205	0.0096	-2.14	< 0.05
	Disadvantaged	Admissions: Selective (ref: non-sel. in highly sel. area)	0.2780	0.0085	32.80	< 0.001
	Disadvantaged	Gorard segregation index	-0.0044	0.0604	-0.07	0.942
	Disadvantaged	Teachers remaining in same school (FTE)	0.0000	0.0001	0.42	0.672
	Disadvantaged	% teachers on leadership pay range	-0.0008	0.0003	-3.13	< 0.01
	Disadvantaged	log(Avg. teacher sickness days)	-0.0184	0.0030	-6.21	< 0.001
(Intercept)...23	Non-Disadvantaged	(Intercept)	4.5344	0.0320	141.60	< 0.001
	Non-Disadvantaged	log(% KS4 pupils who are disadvantaged)	-0.0418	0.0024	-17.70	< 0.001
	Non-Disadvantaged	log(% overall absence)	-0.1722	0.0037	-46.27	< 0.001
	Non-Disadvantaged	log(% pupils with English not first language)	0.0080	0.0010	8.31	< 0.001
	Non-Disadvantaged	% KS4 pupils with low prior attainment (KS2)	-0.0052	0.0001	-45.89	< 0.001
	Non-Disadvantaged	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	-0.0039	0.0065	-0.60	0.551
	Non-Disadvantaged	Admissions: Selective (ref: non-sel. in highly sel. area)	0.1194	0.0054	21.96	< 0.001
	Non-Disadvantaged	Gorard segregation index	-0.0552	0.0430	-1.28	0.2
	Non-Disadvantaged	Teachers remaining in same school (FTE)	0.0005	0.0000	11.81	< 0.001
	Non-Disadvantaged	% teachers on leadership pay range	-0.0011	0.0002	-7.12	< 0.001
	Non-Disadvantaged	log(Avg. teacher sickness days)	-0.0174	0.0019	-9.27	< 0.001

6.2 Imputed Panel: Coefficient Comparison Plot

Show code

imputed_coef_data <- map_dfr(names(imputed_models), function(mn) {
  m <- imputed_models[[mn]]
  fe <- as.data.frame(summary(m)$coefficients)
  fe$term <- rownames(fe)

  fe %>%
    filter(term != "(Intercept)") %>%
    transmute(
      model = outcome_labels[mn],
      term_raw = term,
      term_label = sapply(term, function(t) {
        raw <- gsub("^log\\((.+)\\)$", "\\1", t)
        lbl <- var_label(raw)
        if (grepl("^log\\(", t)) lbl <- paste0("log(", lbl, ")")
        lbl
      }),
      estimate = Estimate,
      se = `Std. Error`,
      ci_lo = Estimate - 1.96 * `Std. Error`,
      ci_hi = Estimate + 1.96 * `Std. Error`
    )
})

ggplot(imputed_coef_data,
       aes(x = estimate, y = reorder(term_label, estimate),
           colour = model)) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey50") +
  geom_pointrange(aes(xmin = ci_lo, xmax = ci_hi),
                  position = position_dodge(width = 0.6), size = 0.4) +
  scale_colour_manual(values = c("All Pupils" = "#2e6260",
                                 "Disadvantaged" = "#4e3c56",
                                 "Non-Disadvantaged" = "#abc766"),
                      name = NULL) +
  labs(x = "Estimate (log scale)", y = NULL,
       subtitle = "Points = estimate; bars = 95% CI. Analysis E (imputed full, all 4 years)") +
  theme_pub

Figure 12: Fixed effect estimates with 95% confidence intervals (imputed full panel models, Analysis E)

6.3 Imputed Panel: Random Effects & Model Fit

Show code

imputed_fit <- map_dfr(names(imputed_models), function(mn) {
  m <- imputed_models[[mn]]
  d <- imputed_diag[[mn]]
  vc <- as.data.frame(VarCorr(m))

  var_year   <- vc$vcov[vc$grp == "year_label"]
  var_ofsted <- vc$vcov[vc$grp == "OFSTEDRATING_1"]
  var_la     <- vc$vcov[vc$grp == "LANAME:gor_name"]
  var_region <- vc$vcov[vc$grp == "gor_name"]
  var_resid  <- d$sigma^2

  tibble(
    Model = outcome_labels[mn],
    var_year   = round(var_year, 6),
    var_ofsted = round(var_ofsted, 6),
    var_la     = round(var_la, 6),
    var_region = round(var_region, 6),
    var_resid  = round(var_resid, 6),
    r2m = round(d$r2_marginal, 4),
    r2c = round(d$r2_conditional, 4),
    N = d$n_obs
  )
})

names(imputed_fit)[2:8] <- c(
  paste0("\u03c3\u00b2", "<sub>year</sub>"),
  paste0("\u03c3\u00b2", "<sub>Ofsted</sub>"),
  paste0("\u03c3\u00b2", "<sub>LA</sub>"),
  paste0("\u03c3\u00b2", "<sub>region</sub>"),
  paste0("\u03c3\u00b2", "<sub>resid</sub>"),
  "R\u00b2<sub>m</sub>",
  "R\u00b2<sub>c</sub>"
)

imputed_fit %>%
  kable(format = "html", align = "lrrrrrrrrr", escape = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11)

Table 10: Random effect variance components and model fit statistics (imputed full panel models, Analysis E)

Model	σ²_year	σ²_Ofsted	σ²_LA	σ²_region	σ²_resid	R²_m	R²_c	N
All Pupils	0.001955	0.002238	0.000885	0.000286	0.008811	0.6319	0.7712	12199
Disadvantaged	0.003536	0.002841	0.001400	0.000842	0.013602	0.5556	0.7280	12060
Non-Disadvantaged	0.001489	0.001677	0.001005	0.000251	0.005485	0.6175	0.7883	12060

6.4 Analysis A vs Analysis E: Coefficient Comparison

Adding one year of carry-forward imputed data (Analysis E) produces coefficients that are very close to the original 3-year full model (Analysis A). This is reassuring: three-quarters of the data is identical, and the imputed fourth year uses conservative carry-forward values.

Show code

# Helper: relabel raw lmer coefficient names to human-readable form
relabel_terms <- function(term) {
  case_when(
    term == "(Intercept)" ~ "(Intercept)",
    term == "log(PTFSM6CLA1A)" ~ "log(% FSM)",
    term == "log(PERCTOT)" ~ "log(% Absence)",
    term == "log(PNUMEAL)" ~ "log(% EAL)",
    term == "PTPRIORLO" ~ "% Low Prior Attainment",
    term == "gorard_segregation" ~ "Gorard Segregation",
    term == "ADMPOL_PTOTHER NON SEL" ~ "Admissions: Other non-selective (ref: non-sel. in highly sel. area)",
    term == "ADMPOL_PTSEL" ~ "Admissions: Selective (ref: non-sel. in highly sel. area)",
    term == "remained_in_the_same_school" ~ "Teacher Retention",
    term == "teachers_on_leadership_pay_range_percent" ~ "Leadership Pay %",
    term == "log(average_number_of_days_taken)" ~ "log(Teacher Sickness)",
    TRUE ~ term
  )
}

a_vs_e <- bind_rows(
  tibble(Analysis = "A (Full, 3 yrs)",
         Term = names(fixef(panel_models$all)),
         `All Pupils` = fixef(panel_models$all),
         Disadvantaged = fixef(panel_models$disadvantaged),
         `Non-Disadvantaged` = fixef(panel_models$non_disadvantaged)),
  tibble(Analysis = "E (Imputed, 4 yrs)",
         Term = names(fixef(imputed_models$all)),
         `All Pupils` = fixef(imputed_models$all),
         Disadvantaged = fixef(imputed_models$disadvantaged),
         `Non-Disadvantaged` = fixef(imputed_models$non_disadvantaged))
) %>%
  mutate(Term = relabel_terms(Term),
         across(where(is.numeric), \(x) round(x, 5)))

a_vs_e %>%
  kable(align = "llrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11) %>%
  collapse_rows(columns = 1, valign = "top")

Table 11: Coefficient comparison: Analysis A (3-year full) vs Analysis E (4-year imputed full)

Analysis	Term	All Pupils	Disadvantaged	Non-Disadvantaged
A (Full, 3 yrs)	(Intercept)	4.63138	4.35957	4.52410
	log(% FSM)	-0.06369	0.01083	-0.03686
	log(% Absence)	-0.20986	-0.29610	-0.16846
	log(% EAL)	0.00623	0.02321	0.00783
	% Low Prior Attainment	-0.00612	-0.00587	-0.00564
	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	0.00101	-0.01940	0.00116
	Admissions: Selective (ref: non-sel. in highly sel. area)	0.10698	0.27895	0.11831
	Gorard Segregation	-0.01783	0.03062	-0.04846
	Teacher Retention	0.00049	-0.00001	0.00046
	Leadership Pay %	-0.00138	-0.00100	-0.00141
	log(Teacher Sickness)	-0.01394	-0.01816	-0.01649
E (Imputed, 4 yrs)	(Intercept)	4.63282	4.36195	4.53439
	log(% FSM)	-0.06748	0.00765	-0.04180
	log(% Absence)	-0.21323	-0.30476	-0.17217
	log(% EAL)	0.00586	0.02322	0.00798
	% Low Prior Attainment	-0.00575	-0.00533	-0.00521
	Admissions: Other non-selective (ref: non-sel. in highly sel. area)	0.00057	-0.02047	-0.00386
	Admissions: Selective (ref: non-sel. in highly sel. area)	0.10795	0.27801	0.11937
	Gorard Segregation	-0.03289	-0.00440	-0.05518
	Teacher Retention	0.00050	0.00003	0.00046
	Leadership Pay %	-0.00109	-0.00079	-0.00115
	log(Teacher Sickness)	-0.01456	-0.01839	-0.01744

7 Stepwise Model Building

The table below shows how the model develops as predictors are added, using the All Pupils outcome. This stepwise approach reveals confounding and mediating relationships between variables — when a coefficient changes substantially as new predictors enter, it indicates that the earlier estimate was partly capturing the effect of the omitted variable.

Five nested models are fitted to the same data (Analysis E, all 4 years):

Model 1 — FSM only: a single predictor measuring school-level disadvantage
Model 2 — adds overall absence rate alongside FSM
Model 3 — adds % low prior attainment, isolating the mediating role of KS2 achievement
Model 4 — all nine fixed-effect predictors (no random effects)
Model 5 — the full multilevel model with random intercepts for year, Ofsted rating, and local authority nested within region

Show code

# ---- Prepare data ----
step_data <- imputed_model_data %>%
  filter(!is.na(ATT8SCR), ATT8SCR > 0)

# ---- Fit 5 nested models ----
m1 <- lm(log(ATT8SCR) ~ log(PTFSM6CLA1A),
         data = step_data)

m2 <- lm(log(ATT8SCR) ~ log(PTFSM6CLA1A) + log(PERCTOT),
         data = step_data)

m3 <- lm(log(ATT8SCR) ~ log(PTFSM6CLA1A) + log(PERCTOT) + PTPRIORLO,
         data = step_data)

m4 <- lm(log(ATT8SCR) ~ log(PTFSM6CLA1A) + log(PERCTOT) + log(PNUMEAL) +
           PTPRIORLO + ADMPOL_PT + gorard_segregation +
           remained_in_the_same_school +
           teachers_on_leadership_pay_range_percent +
           log(average_number_of_days_taken),
         data = step_data)

m5 <- lmer(log(ATT8SCR) ~ log(PTFSM6CLA1A) + log(PERCTOT) + log(PNUMEAL) +
             PTPRIORLO + ADMPOL_PT + gorard_segregation +
             remained_in_the_same_school +
             teachers_on_leadership_pay_range_percent +
             log(average_number_of_days_taken) +
             (1 | year_label) + (1 | OFSTEDRATING_1) + (1 | gor_name/LANAME),
           data = step_data, REML = TRUE, na.action = na.exclude,
           control = lmerControl(optimizer = "bobyqa",
                                 optCtrl = list(maxfun = 20000)))

# ---- Helper: extract coefficients with stars ----
extract_coefs <- function(model, is_lmer = FALSE) {
  s <- summary(model)$coefficients
  df <- as.data.frame(s)
  df$term <- rownames(df)

  p_col <- if ("Pr(>|t|)" %in% names(df)) "Pr(>|t|)" else "Pr(>|z|)"
  t_col <- if ("t value" %in% names(df)) "t value" else "z value"

  df %>%
    transmute(
      term = term,
      estimate = Estimate,
      t_value = .data[[t_col]],
      p_value = .data[[p_col]],
      stars = case_when(
        p_value < 0.001 ~ "***",
        p_value < 0.01  ~ "**",
        p_value < 0.05  ~ "*",
        TRUE ~ ""
      ),
      est_display = paste0(format(round(estimate, 3), nsmall = 3), stars),
      t_display = format(round(t_value, 2), nsmall = 2)
    )
}

c1 <- extract_coefs(m1)
c2 <- extract_coefs(m2)
c3 <- extract_coefs(m3)
c4 <- extract_coefs(m4)
c5 <- extract_coefs(m5, is_lmer = TRUE)

# ---- Build unified term list (in model order) ----
# Use Model 5 (full) order, then add any terms not in it
term_order <- c5$term
all_terms <- c(term_order, setdiff(unique(c(c1$term, c2$term, c3$term, c4$term)), term_order))

# ---- Assemble fixed effects table ----
step_tbl <- tibble(term = all_terms) %>%
  left_join(c1 %>% select(term, est1 = est_display, t1 = t_display), by = "term") %>%
  left_join(c2 %>% select(term, est2 = est_display, t2 = t_display), by = "term") %>%
  left_join(c3 %>% select(term, est3 = est_display, t3 = t_display), by = "term") %>%
  left_join(c4 %>% select(term, est4 = est_display, t4 = t_display), by = "term") %>%
  left_join(c5 %>% select(term, est5 = est_display, t5 = t_display), by = "term") %>%
  mutate(across(everything(), \(x) replace_na(x, ""))) %>%
  mutate(term = relabel_terms(term))

n_fixed <- nrow(step_tbl)

# ---- Random effects (Model 5 only) ----
vc <- as.data.frame(VarCorr(m5))
# vc has columns: grp, var1, var2, vcov, sdcor

re_rows <- tibble(
  term = c("", paste0("  ", vc$grp)),
  est1 = "", t1 = "",
  est2 = "", t2 = "",
  est3 = "", t3 = "",
  est4 = "", t4 = "",
  est5 = c("", format(round(vc$sdcor, 3), nsmall = 3)),
  t5   = c("", rep("(SD)", nrow(vc)))
)
# Label the header row
re_rows$term[1] <- "Random effects"
re_rows$est5[1] <- ""
re_rows$t5[1] <- ""

# Clean up group names for display
re_rows <- re_rows %>%
  mutate(term = case_when(
    term == "Random effects" ~ "Random effects",
    grepl("LANAME:gor_name", term) ~ "  LA (nested in region)",
    grepl("gor_name", term) & !grepl("LANAME", term) ~ "  Region",
    grepl("OFSTEDRATING_1", term) ~ "  Ofsted rating",
    grepl("year_label", term) ~ "  Year",
    grepl("Residual", term) ~ "  Residual",
    TRUE ~ term
  ))

# ---- R² and N rows ----
r2_m1 <- round(summary(m1)$r.squared, 4)
r2_m2 <- round(summary(m2)$r.squared, 4)
r2_m3 <- round(summary(m3)$r.squared, 4)
r2_m4 <- round(summary(m4)$r.squared, 4)
r2_m5 <- performance::r2(m5)
r2m_m5 <- round(r2_m5$R2_marginal, 4)
r2c_m5 <- round(r2_m5$R2_conditional, 4)

n_obs <- nrow(step_data)

footer_rows <- tibble(
  term = c("", "R²", "R² (conditional)", "N"),
  est1 = c("", as.character(r2_m1), "", format(n_obs, big.mark = ",")),
  t1   = c("", "", "", ""),
  est2 = c("", as.character(r2_m2), "", format(n_obs, big.mark = ",")),
  t2   = c("", "", "", ""),
  est3 = c("", as.character(r2_m3), "", format(n_obs, big.mark = ",")),
  t3   = c("", "", "", ""),
  est4 = c("", as.character(r2_m4), "", format(n_obs, big.mark = ",")),
  t4   = c("", "", "", ""),
  est5 = c("", as.character(r2m_m5), as.character(r2c_m5), format(n_obs, big.mark = ",")),
  t5   = c("", "(marginal)", "(fixed + random)", "")
)

full_tbl <- bind_rows(step_tbl, re_rows, footer_rows)

# ---- Render with kableExtra ----
names(full_tbl) <- c("Variable",
                      "Est.", "t",
                      "Est. ", "t ",
                      "Est.  ", "t  ",
                      "Est.   ", "t   ",
                      "Est.    ", "t    ")

kbl(full_tbl, format = "html", align = "lrlrlrlrlrl", escape = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 10) %>%
  add_header_above(c(" " = 1,
                      "Model 1:\nFSM only" = 2,
                      "Model 2:\n+ Absence" = 2,
                      "Model 3:\n+ Prior att." = 2,
                      "Model 4:\nAll fixed" = 2,
                      "Model 5:\nFull multilevel" = 2)) %>%
  row_spec(n_fixed + 1, bold = TRUE, extra_css = "border-top: 2px solid #666;") %>%
  row_spec(n_fixed + nrow(re_rows) + 1, extra_css = "border-top: 2px solid #666;") %>%
  row_spec(n_fixed + nrow(re_rows) + 2, bold = TRUE) %>%
  row_spec(n_fixed + nrow(re_rows) + 3, bold = TRUE) %>%
  row_spec(n_fixed + nrow(re_rows) + 4, bold = TRUE) %>%
  footnote(general = "Significance: *** p < 0.001, ** p < 0.01, * p < 0.05. Models 1–4 are OLS; Model 5 is a multilevel model (lmer) with random intercepts. Random effects shown as standard deviations.",
           general_title = "")

Table 12: Stepwise model building: All Pupils (Analysis E data)

	Model 1: FSM only		Model 2: + Absence		Model 3: + Prior att.		Model 4: All fixed		Model 5: Full multilevel
Variable	Est.	t	Est.	t	Est.	t	Est.	t	Est.	t
(Intercept)	4.473***	621.25	5.006***	579.97	4.833***	547.18	4.628***	319.11	4.633***	125.61
log(% FSM)	-0.201***	-88.94	-0.122***	-59.66	-0.073***	-33.46	-0.071***	-28.60	-0.067***	-24.90
log(% Absence)			-0.364***	-82.87	-0.287***	-65.28	-0.242***	-51.65	-0.213***	-46.03
log(% EAL)							0.014***	13.69	0.006***	4.92
% Low Prior Attainment					-0.006***	-45.75	-0.005***	-37.59	-0.006***	-41.35
Admissions: Other non-selective (ref: non-sel. in highly sel. area)							0.051***	11.01	0.001	0.08
Admissions: Selective (ref: non-sel. in highly sel. area)							0.140***	19.96	0.108***	16.20
Gorard Segregation							0.052	1.92	-0.033	-0.69
Teacher Retention							0.001***	12.92	0.000***	10.20
Leadership Pay %							-0.001***	-6.10	-0.001***	-5.39
log(Teacher Sickness)							-0.016***	-6.08	-0.015***	-6.17
Random effects
LA (nested in region)									0.030	(SD)
Region									0.017	(SD)
Ofsted rating									0.047	(SD)
Year									0.044	(SD)
Residual									0.094	(SD)

R²	0.3932		0.6117		0.6686		0.6929		0.6319	(marginal)
R² (conditional)									0.7712	(fixed + random)
N	12,210		12,210		12,210		12,210		12,210
Significance: * p < 0.001, p < 0.01, * p < 0.05. Models 1–4 are OLS; Model 5 is a multilevel model (lmer) with random intercepts. Random effects shown as standard deviations.

7.0.1 What the stepwise results reveal

Model 1 → Model 2 (adding absence): When FSM is the sole predictor, its coefficient is -0.201 — but when absence is added in Model 2, the FSM coefficient shifts to -0.122. This change reveals confounding: schools with higher FSM rates also tend to have higher absence rates, so Model 1’s FSM coefficient was partly capturing the absence–attainment relationship. The absence coefficient itself (-0.364) is strongly negative, confirming that higher absence is associated with lower attainment. R² increases from 0.3932 to 0.6117.

Model 2 → Model 3 (adding % low prior attainment): Adding a single variable — the proportion of pupils with low KS2 prior attainment — shifts the FSM coefficient from -0.122 to -0.073. This is one of the clearest examples of mediation in the model: disadvantaged schools have more pupils arriving with low prior attainment, and much of the apparent FSM–attainment relationship operates through this pathway. The prior attainment coefficient (-0.006) is strongly negative. R² jumps to 0.6686, a substantial increase from a single additional predictor.

Model 3 → Model 4 (adding remaining fixed effects): The FSM coefficient shifts further to -0.071 as EAL, admissions policy, segregation, and workforce variables enter. The absence coefficient moves from -0.287 to -0.242. R² rises to 0.6929, showing that the full set of school-level characteristics explains more variation, though the incremental gain from these six variables is more modest than the jump from adding prior attainment alone.

Model 4 → Model 5 (adding random effects): The fixed-effect coefficients are broadly similar (R²_m = 0.6319), but the conditional R² (R²_c = 0.7712) shows a substantial jump. The difference between R²_m and R²_c represents variance explained by the grouping structure — knowing which Ofsted category, region, and local authority a school belongs to adds considerable explanatory power beyond the nine measured predictors. The random effect standard deviations in the table show how much the intercept varies across each grouping level. This suggests that unobserved LA-level and regional factors (funding levels, local labour markets, housing costs, historical policy choices) contribute meaningfully to attainment differences.

7.1 Stepwise helper function

7.2 Disadvantaged Pupils

Show code

sw_dis <- fit_stepwise(imputed_model_data, "ATT8SCR_FSM6CLA1A", "Disadvantaged Pupils")
sw_dis$table

Table 13: Stepwise model building: Disadvantaged Pupils (Analysis E data)

	Model 1: FSM only		Model 2: + Absence		Model 3: + Prior att.		Model 4: All fixed		Model 5: Full multilevel
Variable	Est.	t	Est.	t	Est.	t	Est.	t	Est.	t
(Intercept)	4.066***	378.43	4.837***	408.17	4.628***	374.31	4.331***	230.83	4.362***	93.84
log(% FSM)	-0.140***	-41.73	-0.021***	-7.32	0.039***	12.34	0.028***	8.47	0.008*	2.08
log(% Absence)			-0.534***	-90.92	-0.444***	-74.15	-0.352***	-58.60	-0.305***	-52.15
log(% EAL)							0.033***	24.84	0.023***	15.52
% Low Prior Attainment					-0.007***	-39.31	-0.005***	-28.98	-0.005***	-29.94
Admissions: Other non-selective (ref: non-sel. in highly sel. area)							0.017**	2.89	-0.020*	-2.14
Admissions: Selective (ref: non-sel. in highly sel. area)							0.320***	35.08	0.278***	32.80
Gorard Segregation							0.029	0.84	-0.004	-0.07
Teacher Retention							0.000***	4.63	0.000	0.42
Leadership Pay %							-0.001***	-3.62	-0.001**	-3.13
log(Teacher Sickness)							-0.017***	-5.11	-0.018***	-6.21
Random effects
LA (nested in region)									0.037	(SD)
Region									0.029	(SD)
Ofsted rating									0.053	(SD)
Year									0.059	(SD)
Residual									0.117	(SD)

R²	0.1261		0.4813		0.5404		0.6264		0.5556	(marginal)
R² (conditional)									0.728	(fixed + random)
N	12,071		12,071		12,071		12,071		12,071
Significance: * p < 0.001, p < 0.01, * p < 0.05. Models 1–4 are OLS; Model 5 is a multilevel model (lmer). Random effects shown as standard deviations.

7.2.1 What the stepwise results reveal — disadvantaged pupils

Model 1 → Model 2 (adding absence): For disadvantaged pupils, the FSM coefficient starts at -0.14 and shifts to -0.021 when absence enters. As with all pupils, the confounding between FSM and absence is clear — schools with higher deprivation also tend to have higher absence. R² increases from 0.1261 to 0.4813.

Model 2 → Model 3 (adding % low prior attainment): The FSM coefficient moves from -0.021 to 0.039 — the mediating role of prior attainment is substantial here too. The prior attainment coefficient (-0.007) is strongly negative. R² jumps to 0.5404.

Model 3 → Model 4 (adding remaining fixed effects): The FSM coefficient shifts to 0.028. This is a critical transition for the disadvantaged model — watch whether the FSM coefficient flips sign compared to earlier steps. When prior attainment and the full set of controls are in place, a higher school-level FSM rate may become positively associated with disadvantaged attainment, suggesting that schools with more concentrated disadvantage may develop stronger support structures for their FSM-eligible pupils. R² rises to 0.6264.

Model 4 → Model 5 (adding random effects): The marginal R² is 0.5556 and the conditional R² rises to 0.728. The grouping structure again adds substantial explanatory power.

7.3 Non-Disadvantaged Pupils

Show code

sw_non <- fit_stepwise(imputed_model_data, "ATT8SCR_NFSM6CLA1A", "Non-Disadvantaged Pupils")
sw_non$table

Table 14: Stepwise model building: Non-Disadvantaged Pupils (Analysis E data)

	Model 1: FSM only		Model 2: + Absence		Model 3: + Prior att.		Model 4: All fixed		Model 5: Full multilevel
Variable	Est.	t	Est.	t	Est.	t	Est.	t	Est.	t
(Intercept)	4.401***	653.36	4.859***	632.16	4.687***	609.22	4.484***	367.08	4.534***	141.60
log(% FSM)	-0.157***	-74.49	-0.086***	-45.85	-0.037***	-18.92	-0.038***	-17.51	-0.042***	-17.70
log(% Absence)			-0.317***	-83.25	-0.243***	-65.16	-0.197***	-50.31	-0.172***	-46.27
log(% EAL)							0.014***	16.33	0.008***	8.31
% Low Prior Attainment					-0.006***	-52.21	-0.005***	-42.92	-0.005***	-45.89
Admissions: Other non-selective (ref: non-sel. in highly sel. area)							0.060***	15.79	-0.004	-0.60
Admissions: Selective (ref: non-sel. in highly sel. area)							0.153***	25.87	0.119***	21.96
Gorard Segregation							0.055*	2.42	-0.055	-1.28
Teacher Retention							0.001***	15.18	0.000***	11.81
Leadership Pay %							-0.002***	-7.97	-0.001***	-7.12
log(Teacher Sickness)							-0.018***	-8.35	-0.017***	-9.27
Random effects
LA (nested in region)									0.032	(SD)
Region									0.016	(SD)
Ofsted rating									0.041	(SD)
Year									0.039	(SD)
Residual									0.074	(SD)

R²	0.315		0.5649		0.6453		0.6844		0.6175	(marginal)
R² (conditional)									0.7883	(fixed + random)
N	12,071		12,071		12,071		12,071		12,071
Significance: * p < 0.001, p < 0.01, * p < 0.05. Models 1–4 are OLS; Model 5 is a multilevel model (lmer). Random effects shown as standard deviations.

7.3.1 What the stepwise results reveal — non-disadvantaged pupils

Model 1 → Model 2 (adding absence): For non-disadvantaged pupils, the FSM coefficient begins at -0.157 and shifts to -0.086 when absence is added. The pattern mirrors the all-pupils model — the confounding between FSM and absence applies regardless of pupil subgroup. R² increases from 0.315 to 0.5649.

Model 2 → Model 3 (adding % low prior attainment): The FSM coefficient moves from -0.086 to -0.037. Prior attainment (-0.006) again absorbs a substantial part of the FSM-attainment relationship. R² jumps to 0.6453.

Model 3 → Model 4 (adding remaining fixed effects): With the full set of predictors, the FSM coefficient reaches -0.038. Unlike the disadvantaged model, this coefficient typically remains negative — the contextual effect of school-level deprivation continues to drag down attainment for non-FSM pupils even after extensive controls. R² rises to 0.6844.

Model 4 → Model 5 (adding random effects): The marginal R² is 0.6175 and the conditional R² reaches 0.7883.

7.4 Comparing Disadvantaged and Non-Disadvantaged Models

The most striking difference between the two subgroup models is the behaviour of the % FSM coefficient as controls are added:

The FSM sign flip

In the disadvantaged model, the FSM coefficient flips from negative to positive as prior attainment and other controls enter the model — from -0.14 (Model 1) to 0.008 (Model 5). This means that, after accounting for intake characteristics, schools with higher concentrations of disadvantaged pupils achieve better results for their FSM-eligible pupils than would be expected. This is consistent with a “specialisation” effect: schools that serve many disadvantaged pupils may develop more effective support programmes, allocate Pupil Premium more strategically, or benefit from a critical mass of targeted interventions.

In contrast, the non-disadvantaged model shows the FSM coefficient remains negative throughout: from -0.157 (Model 1) to -0.042 (Model 5). For non-FSM pupils, attending a school with a higher proportion of disadvantaged peers is associated with lower attainment even after extensive controls — a contextual or compositional disadvantage that persists.

Beyond the FSM flip, the two models differ in several other important ways:

Absence has a stronger negative effect on disadvantaged pupils (-0.305) than on non-disadvantaged pupils (-0.172). Disadvantaged pupils are more vulnerable to the effects of missing school, possibly because they have fewer resources to catch up outside the classroom.
Prior attainment (% low KS2) has a large negative coefficient in both models (-0.005 for disadvantaged, -0.005 for non-disadvantaged), but the relative magnitude may differ — reflecting that schools with many low-attaining pupils at entry face a steeper challenge for all pupil groups.
Selective admissions benefits non-disadvantaged pupils more (0.119) than disadvantaged pupils (0.278), suggesting that the grammar school attainment premium is not equally shared across the pupil population.
EAL (% English as additional language) shows a positive coefficient in both models (0.023 for disadvantaged, 0.008 for non-disadvantaged). Schools with more EAL pupils tend to have higher attainment after controlling for deprivation, possibly reflecting the academic strengths that many bilingual pupils bring, or the urban location of these schools which correlates with other positive factors.
Teacher retention is positive in both models (0 for disadvantaged, 0 for non-disadvantaged), confirming that staff stability benefits all pupil groups — though the size of the effect may differ, indicating whether disadvantaged or non-disadvantaged pupils are more sensitive to teacher turnover.

These patterns underscore that the same school-level factors do not operate equally across pupil subgroups. Policy interventions need to account for these differential effects — what benefits disadvantaged pupils most (concentrated disadvantage with strong support, low absence) is not the same as what benefits non-disadvantaged pupils most (selective admissions, school composition).

7.4.1 Variable importance: standardised coefficients

The chart below puts all predictors on a common scale so that their relative importance can be compared directly across pupil groups. Raw model coefficients cannot be compared across variables because the predictors are measured in different units (percentages, indices, days) — a coefficient of −0.15 on log(% FSM) is not directly comparable to −0.003 on % low prior attainment.

The standardisation works as follows: for each predictor, the coefficient is multiplied by the standard deviation of that predictor across all schools in the dataset (computed on the log scale for log-transformed variables, matching how they enter the model). The result — the standardised coefficient — answers the question: “if this predictor moved by one standard deviation across the school population, how much would log(ATT8) change?” This is a measure of practical importance, not statistical significance. A large standardised coefficient means the variable has a big impact on attainment differences across schools; a small one means it contributes little to the variation we observe, even if it is statistically significant.

This differs from the t-value shown in the stepwise tables above, which measures statistical confidence (coefficient ÷ standard error). With 12,000+ observations, even tiny effects can be highly significant (large t-values), so t-values are poor guides to practical importance. The standardised coefficients shown here are the better tool for understanding which levers matter most.

The chart also includes the selective admissions dummy variable (grammar schools). Because this is a binary 0/1 indicator rather than a continuous measure, the standardisation is applied in the same way — the coefficient is multiplied by the SD of the 0/1 variable. Since only around 5% of schools are selective, the SD is small (~0.22), so the standardised coefficient reflects the effect weighted by how common selective schools are in the population. This puts it on the same “population-level importance” scale as the continuous predictors: a variable that affects every school modestly will rank higher than one with a dramatic effect on only a few schools. The “other non-selective” dummy is omitted as its coefficient is small and generally non-significant.

Show code

library(plotly)

# ---- Helper: compute standardised coefficients ----
standardised_importance <- function(model, data, outcome_var, group_label) {
  fe <- fixef(model)
  fe <- fe[names(fe) != "(Intercept)"]

  coef_info <- tribble(
    ~term,                                     ~data_col,                                   ~transform,
    "log(PTFSM6CLA1A)",                        "PTFSM6CLA1A",                               "log",
    "log(PERCTOT)",                             "PERCTOT",                                   "log",
    "log(PNUMEAL)",                             "PNUMEAL",                                   "log",
    "PTPRIORLO",                                "PTPRIORLO",                                 "none",
    "ADMPOL_PTSEL",                             "ADMPOL_PT",                                 "dummy_SEL",
    "gorard_segregation",                       "gorard_segregation",                        "none",
    "remained_in_the_same_school",              "remained_in_the_same_school",               "none",
    "teachers_on_leadership_pay_range_percent",  "teachers_on_leadership_pay_range_percent",  "none",
    "log(average_number_of_days_taken)",         "average_number_of_days_taken",              "log"
  )

  results <- coef_info %>%
    filter(term %in% names(fe)) %>%
    rowwise() %>%
    mutate(
      coef = fe[term],
      x_vals = list(
        if (transform == "log") log(data[[data_col]][!is.na(data[[data_col]]) & data[[data_col]] > 0])
        else if (transform == "dummy_SEL") as.numeric(data[[data_col]] == "SEL")[!is.na(data[[data_col]])]
        else data[[data_col]][!is.na(data[[data_col]])]
      ),
      sd_x = sd(unlist(x_vals), na.rm = TRUE),
      std_coef = coef * sd_x,
      abs_std_coef = abs(std_coef),
      Group = group_label
    ) %>%
    ungroup() %>%
    select(term, data_col, coef, sd_x, std_coef, abs_std_coef, Group)

  results
}

display_names <- c(
  "log(PTFSM6CLA1A)" = "% Disadvantaged (FSM)",
  "log(PERCTOT)" = "Overall Absence Rate",
  "log(PNUMEAL)" = "% EAL",
  "PTPRIORLO" = "% Low Prior Attainment",
  "ADMPOL_PTSEL" = "Selective Admissions (grammar)",
  "gorard_segregation" = "Gorard Segregation Index",
  "remained_in_the_same_school" = "Teacher Retention",
  "teachers_on_leadership_pay_range_percent" = "Leadership Pay %",
  "log(average_number_of_days_taken)" = "Teacher Sickness Days"
)

# Compute for all three groups using the imputed full models (Analysis E)
imp_all <- standardised_importance(imputed_models$all, imputed_model_data,
                                    "ATT8SCR", "All Pupils")
imp_dis <- standardised_importance(imputed_models$disadvantaged, imputed_model_data,
                                    "ATT8SCR_FSM6CLA1A", "Disadvantaged")
imp_non <- standardised_importance(imputed_models$non_disadvantaged, imputed_model_data,
                                    "ATT8SCR_NFSM6CLA1A", "Non-Disadvantaged")

importance_national <- bind_rows(imp_all, imp_dis, imp_non) %>%
  mutate(display_name = display_names[term])

# ---- Chart ----
importance_chart_data <- importance_national %>%
  mutate(
    display_name = factor(display_name),
    Group = factor(Group, levels = c("All Pupils", "Disadvantaged", "Non-Disadvantaged")),
    tooltip_text = paste0(
      "<b>", display_name, "</b><br>",
      "Group: ", Group, "<br>",
      "Raw coefficient: ", round(coef, 5), "<br>",
      "SD of predictor: ", round(sd_x, 3), "<br>",
      "Standardised coef: ", round(std_coef, 4)
    )
  )

var_order <- importance_chart_data %>%
  group_by(display_name) %>%
  summarise(avg_abs = mean(abs_std_coef)) %>%
  arrange(avg_abs) %>%
  pull(display_name)

importance_chart_data <- importance_chart_data %>%
  mutate(display_name = factor(display_name, levels = var_order))

p_imp <- ggplot(importance_chart_data,
       aes(x = display_name, y = std_coef, fill = Group,
           text = tooltip_text)) +
  geom_col(position = position_dodge(width = 0.8), width = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  coord_flip() +
  scale_fill_manual(values = c("All Pupils" = "#2e6260",
                                "Disadvantaged" = "#4e3c56",
                                "Non-Disadvantaged" = "#abc766")) +
  labs(x = NULL, y = "Standardised Coefficient (signed)",
       fill = NULL) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")

ggplotly(p_imp, tooltip = "text") %>%
  plotly::layout(legend = list(orientation = "h", y = -0.15))

Figure 13: Standardised coefficients: relative variable importance across pupil groups (Model 5). Bars show the change in log(ATT8) associated with a one-SD shift in each predictor.

The chart makes several features of the model immediately visible:

% FSM flips direction for disadvantaged pupils (positive bar) while remaining strongly negative for non-disadvantaged and all pupils — the sign flip discussed above.
Overall absence and % low prior attainment are the two largest negative drivers of attainment across all groups, with absence hitting disadvantaged pupils harder.
% EAL is the largest positive predictor in standardised terms, reflecting that schools with more EAL pupils perform better than expected after controlling for deprivation.
Selective admissions (grammar schools) shows a strong positive standardised coefficient, but the effect is notably larger for non-disadvantaged pupils than for disadvantaged pupils — the grammar school premium is not equally shared.
Teacher retention has a modest positive effect, slightly larger for disadvantaged pupils.
Gorard segregation, leadership pay %, and teacher sickness days have small standardised effects — they are statistically significant in the model but contribute relatively little to the variation in attainment across schools.

8 Observed vs Predicted

The scatter plots below show observed Attainment 8 scores against imputed-full-model fitted values (Analysis E) for each of the four academic years. Schools in Brighton and Hove are highlighted in pink. A linear regression line with R² is overlaid on each panel; the dashed grey line marks perfect prediction.

8.1 All Pupils

Show code

HIGHLIGHT_LA <- "Brighton and Hove"

# Prepare data with highlight flag
pred_data <- panel_data %>%
  filter(!is.na(predicted_ATT8SCR_imputed), !is.na(ATT8SCR)) %>%
  mutate(
    residual = ATT8SCR - predicted_ATT8SCR_imputed,
    abs_resid = abs(residual),
    highlight = ifelse(LANAME == HIGHLIGHT_LA, HIGHLIGHT_LA, "Other"),
    highlight = factor(highlight, levels = c("Other", HIGHLIGHT_LA))
  ) %>%
  arrange(highlight)   # draw highlighted points on top

# Per-year R² labels
r2_labels <- pred_data %>%
  group_by(year_label) %>%
  summarise(
    r2 = cor(ATT8SCR, predicted_ATT8SCR_imputed, use = "complete.obs")^2,
    n  = n(),
    .groups = "drop"
  ) %>%
  mutate(label = paste0("R\u00b2 = ", round(r2, 3), "  (n = ", scales::comma(n), ")"))

ggplot(pred_data, aes(x = predicted_ATT8SCR_imputed, y = ATT8SCR)) +
  geom_abline(slope = 1, intercept = 0, colour = "grey60",
              linewidth = 0.5, linetype = "dashed") +
  geom_point(aes(colour = highlight, alpha = highlight), size = 1) +
  geom_smooth(method = "lm", se = FALSE, colour = "#7b132b", linewidth = 0.6) +
  geom_text(data = r2_labels,
            aes(x = -Inf, y = Inf, label = label),
            inherit.aes = FALSE, hjust = -0.05, vjust = 1.4,
            size = 3.3, colour = "grey30") +
  scale_colour_manual(values = c("Other" = "#49a0c4",
                                  "Brighton and Hove" = "#e16fca")) +
  scale_alpha_manual(values = c("Other" = 0.12,
                                 "Brighton and Hove" = 0.85), guide = "none") +
  facet_wrap(~ year_label, ncol = 2) +
  labs(x = "Predicted ATT8 (imputed full model)", y = "Observed ATT8",
       colour = NULL) +
  coord_equal() +
  theme_pub +
  theme(legend.position = "bottom",
        strip.text = element_text(size = 11))

Figure 14: Observed vs predicted Attainment 8 — All Pupils (imputed full model, by year). Brighton and Hove schools in pink.

Show code

library(plotly)
library(dplyr)
library(scales)

# 1. Prepare data (same as your original logic)
pred_data <- panel_data %>%
  filter(!is.na(predicted_ATT8SCR_imputed), !is.na(ATT8SCR)) %>%
  mutate(
    residual = ATT8SCR - predicted_ATT8SCR_imputed,
    highlight = ifelse(LANAME == HIGHLIGHT_LA, HIGHLIGHT_LA, "Other"),
    highlight = factor(highlight, levels = c("Other", HIGHLIGHT_LA)),
    # Custom tooltip text
    tooltip_text = paste0(
      "LA: ", LANAME,
      "<br>Sch:", SCHNAME,
      "<br>Observed: ", round(ATT8SCR, 2),
      "<br>Predicted: ", round(predicted_ATT8SCR_imputed, 2),
      "<br>Residual: ", round(residual, 2)
    )
  ) %>%
  arrange(highlight)

# 2. Calculate R² labels
r2_labels <- pred_data %>%
  group_by(year_label) %>%
  summarise(
    r2 = cor(ATT8SCR, predicted_ATT8SCR_imputed, use = "complete.obs")^2,
    n  = n(),
    .groups = "drop"
  ) %>%
  mutate(label = paste0("R² = ", round(r2, 3), " (n = ", comma(n), ")"))

# 3. Build the ggplot object
p <- ggplot(pred_data, aes(x = predicted_ATT8SCR_imputed, y = ATT8SCR)) +
  geom_abline(slope = 1, intercept = 0, colour = "grey60",
              linewidth = 0.5, linetype = "dashed") +
  # We add 'text' aesthetic here for the Plotly tooltip
  geom_point(aes(colour = highlight, alpha = highlight, text = tooltip_text), size = 1) +
  geom_smooth(method = "lm", se = FALSE, colour = "#7b132b", linewidth = 0.6) +
  # Note: geom_text can be finicky in ggplotly; using explicit coordinates helps
  geom_text(data = r2_labels,
            aes(x = 20, y = 85, label = label),
            inherit.aes = FALSE, size = 3, colour = "grey30") +
  scale_colour_manual(values = c("Other" = "#49a0c4",
                                 "Brighton and Hove" = "#e16fca")) +
  scale_alpha_manual(values = c("Other" = 0.2,
                                "Brighton and Hove" = 0.9), guide = "none") +
  facet_wrap(~ year_label, ncol = 2) +
  labs(x = "Predicted ATT8 (imputed full model)", y = "Observed ATT8", colour = NULL) +
  theme_minimal() + # Use minimal for cleaner plotly conversion
  theme(legend.position = "bottom")

# 4. Convert to Plotly
ggplotly(p, tooltip = "text") %>%
  layout(
    legend = list(orientation = "h", x = 0.4, y = -0.1),
    margin = list(t = 50) # Space for facet titles
  )

8.2 Disadvantaged Pupils

Show code

pred_disadv <- panel_data %>%
  filter(!is.na(predicted_ATT8SCR_FSM6CLA1A_imputed), !is.na(ATT8SCR_FSM6CLA1A)) %>%
  mutate(
    highlight = ifelse(LANAME == HIGHLIGHT_LA, HIGHLIGHT_LA, "Other"),
    highlight = factor(highlight, levels = c("Other", HIGHLIGHT_LA))
  ) %>%
  arrange(highlight)

r2_disadv <- pred_disadv %>%
  group_by(year_label) %>%
  summarise(
    r2 = cor(ATT8SCR_FSM6CLA1A, predicted_ATT8SCR_FSM6CLA1A_imputed,
             use = "complete.obs")^2,
    n  = n(),
    .groups = "drop"
  ) %>%
  mutate(label = paste0("R\u00b2 = ", round(r2, 3), "  (n = ", scales::comma(n), ")"))

ggplot(pred_disadv, aes(x = predicted_ATT8SCR_FSM6CLA1A_imputed,
                         y = ATT8SCR_FSM6CLA1A)) +
  geom_abline(slope = 1, intercept = 0, colour = "grey60",
              linewidth = 0.5, linetype = "dashed") +
  geom_point(aes(colour = highlight, alpha = highlight), size = 1) +
  geom_smooth(method = "lm", se = FALSE, colour = "#7b132b", linewidth = 0.6) +
  geom_text(data = r2_disadv,
            aes(x = -Inf, y = Inf, label = label),
            inherit.aes = FALSE, hjust = -0.05, vjust = 1.4,
            size = 3.3, colour = "grey30") +
  scale_colour_manual(values = c("Other" = "#49a0c4",
                                  "Brighton and Hove" = "#e16fca")) +
  scale_alpha_manual(values = c("Other" = 0.12,
                                 "Brighton and Hove" = 0.85), guide = "none") +
  facet_wrap(~ year_label, ncol = 2) +
  labs(x = "Predicted ATT8 — Disadvantaged (imputed full model)",
       y = "Observed ATT8 — Disadvantaged",
       colour = NULL) +
  coord_equal() +
  theme_pub +
  theme(legend.position = "bottom",
        strip.text = element_text(size = 11))

Figure 15: Observed vs predicted Attainment 8 — Disadvantaged Pupils (imputed full model, by year). Brighton and Hove schools in pink.

Show code

library(plotly)
library(dplyr)
library(scales)

# 1. Prepare data for Disadvantaged subset
pred_disadv <- panel_data %>%
  filter(!is.na(predicted_ATT8SCR_FSM6CLA1A_imputed), !is.na(ATT8SCR_FSM6CLA1A)) %>%
  mutate(
    residual = ATT8SCR_FSM6CLA1A - predicted_ATT8SCR_FSM6CLA1A_imputed,
    highlight = ifelse(LANAME == HIGHLIGHT_LA, HIGHLIGHT_LA, "Other"),
    highlight = factor(highlight, levels = c("Other", HIGHLIGHT_LA)),
    # Enhanced tooltip for disadvantaged context
    tooltip_text = paste0(
      "<b>", SCHNAME, "</b><br>",  # Assumes SCHNAME exists in your data
      "LA: ", LANAME,
      "<br>Sch:", SCHNAME,
      "<br>Observed (Disadv): ", round(ATT8SCR_FSM6CLA1A, 2),
      "<br>Predicted (Disadv): ", round(predicted_ATT8SCR_FSM6CLA1A_imputed, 2),
      "<br>Residual: ", round(residual, 2)
    )
  ) %>%
  arrange(highlight)

# 2. Calculate R² for the Disadvantaged model
r2_disadv_stats <- pred_disadv %>%
  group_by(year_label) %>%
  summarise(
    r2 = cor(ATT8SCR_FSM6CLA1A, predicted_ATT8SCR_FSM6CLA1A_imputed, use = "complete.obs")^2,
    n  = n(),
    .groups = "drop"
  ) %>%
  mutate(label = paste0("R² = ", round(r2, 3), " (n = ", comma(n), ")"))

# 3. Build the ggplot object
p_disadv <- ggplot(pred_disadv, aes(x = predicted_ATT8SCR_FSM6CLA1A_imputed,
                                   y = ATT8SCR_FSM6CLA1A)) +
  geom_abline(slope = 1, intercept = 0, colour = "grey60",
              linewidth = 0.5, linetype = "dashed") +
  geom_point(aes(colour = highlight, alpha = highlight, text = tooltip_text), size = 1.2) +
  geom_smooth(method = "lm", se = FALSE, colour = "#7b132b", linewidth = 0.6) +
  # Positioning labels at safe numeric coordinates (e.g., x=10, y=85)
  geom_text(data = r2_disadv_stats,
            aes(x = 10, y = 85, label = label),
            inherit.aes = FALSE, size = 3, colour = "grey30", hjust = 0) +
  scale_colour_manual(values = c("Other" = "#49a0c4",
                                 "Brighton and Hove" = "#e16fca")) +
  scale_alpha_manual(values = c("Other" = 0.15,
                                "Brighton and Hove" = 0.9), guide = "none") +
  facet_wrap(~ year_label, ncol = 2) +
  labs(x = "Predicted ATT8 — Disadvantaged",
       y = "Observed ATT8 — Disadvantaged",
       colour = NULL) +
  theme_minimal() +
  theme(legend.position = "bottom",
        strip.text = element_text(face = "bold"))

# 4. Final Plotly conversion
ggplotly(p_disadv, tooltip = "text") %>%
  layout(
    legend = list(orientation = "h", x = 0.3, y = -0.15),
    margin = list(t = 60)
  )

8.3 Non-Disadvantaged Pupils

Show code

pred_nondisadv <- panel_data %>%
  filter(!is.na(predicted_ATT8SCR_NFSM6CLA1A_imputed), !is.na(ATT8SCR_NFSM6CLA1A)) %>%
  mutate(
    highlight = ifelse(LANAME == HIGHLIGHT_LA, HIGHLIGHT_LA, "Other"),
    highlight = factor(highlight, levels = c("Other", HIGHLIGHT_LA))
  ) %>%
  arrange(highlight)

r2_nondisadv <- pred_nondisadv %>%
  group_by(year_label) %>%
  summarise(
    r2 = cor(ATT8SCR_NFSM6CLA1A, predicted_ATT8SCR_NFSM6CLA1A_imputed,
             use = "complete.obs")^2,
    n  = n(),
    .groups = "drop"
  ) %>%
  mutate(label = paste0("R\u00b2 = ", round(r2, 3), "  (n = ", scales::comma(n), ")"))

ggplot(pred_nondisadv, aes(x = predicted_ATT8SCR_NFSM6CLA1A_imputed,
                            y = ATT8SCR_NFSM6CLA1A)) +
  geom_abline(slope = 1, intercept = 0, colour = "grey60",
              linewidth = 0.5, linetype = "dashed") +
  geom_point(aes(colour = highlight, alpha = highlight), size = 1) +
  geom_smooth(method = "lm", se = FALSE, colour = "#7b132b", linewidth = 0.6) +
  geom_text(data = r2_nondisadv,
            aes(x = -Inf, y = Inf, label = label),
            inherit.aes = FALSE, hjust = -0.05, vjust = 1.4,
            size = 3.3, colour = "grey30") +
  scale_colour_manual(values = c("Other" = "#49a0c4",
                                  "Brighton and Hove" = "#e16fca")) +
  scale_alpha_manual(values = c("Other" = 0.12,
                                 "Brighton and Hove" = 0.85), guide = "none") +
  facet_wrap(~ year_label, ncol = 2) +
  labs(x = "Predicted ATT8 — Non-Disadvantaged (imputed full model)",
       y = "Observed ATT8 — Non-Disadvantaged",
       colour = NULL) +
  coord_equal() +
  theme_pub +
  theme(legend.position = "bottom",
        strip.text = element_text(size = 11))

Figure 16: Observed vs predicted Attainment 8 — Non-Disadvantaged Pupils (imputed full model, by year). Brighton and Hove schools in pink.

Show code

library(plotly)
library(dplyr)
library(scales)

# 1. Prepare data for Non-Disadvantaged subset
pred_nondisadv <- panel_data %>%
  filter(!is.na(predicted_ATT8SCR_NFSM6CLA1A_imputed), !is.na(ATT8SCR_NFSM6CLA1A)) %>%
  mutate(
    residual = ATT8SCR_NFSM6CLA1A - predicted_ATT8SCR_NFSM6CLA1A_imputed,
    highlight = ifelse(LANAME == HIGHLIGHT_LA, HIGHLIGHT_LA, "Other"),
    highlight = factor(highlight, levels = c("Other", HIGHLIGHT_LA)),
    # Custom tooltip
    tooltip_text = paste0(
      "<b>", SCHNAME, "</b><br>",
      "LA: ", LANAME,
      "<br>Sch:", SCHNAME,
      "<br>Observed (Non-Disadv): ", round(ATT8SCR_NFSM6CLA1A, 2),
      "<br>Predicted (Non-Disadv): ", round(predicted_ATT8SCR_NFSM6CLA1A_imputed, 2),
      "<br>Residual: ", round(residual, 2)
    )
  ) %>%
  arrange(highlight)

# 2. Calculate R² for the Non-Disadvantaged model
r2_nondisadv_stats <- pred_nondisadv %>%
  group_by(year_label) %>%
  summarise(
    r2 = cor(ATT8SCR_NFSM6CLA1A, predicted_ATT8SCR_NFSM6CLA1A_imputed, use = "complete.obs")^2,
    n  = n(),
    .groups = "drop"
  ) %>%
  mutate(label = paste0("R² = ", round(r2, 3), " (n = ", comma(n), ")"))

# 3. Build the ggplot object
p_nondisadv <- ggplot(pred_nondisadv, aes(x = predicted_ATT8SCR_NFSM6CLA1A_imputed,
                                         y = ATT8SCR_NFSM6CLA1A)) +
  geom_abline(slope = 1, intercept = 0, colour = "grey60",
              linewidth = 0.5, linetype = "dashed") +
  geom_point(aes(colour = highlight, alpha = highlight, text = tooltip_text), size = 1.2) +
  geom_smooth(method = "lm", se = FALSE, colour = "#7b132b", linewidth = 0.6) +
  # Placed at higher coordinates as Non-Disadv scores are typically higher
  geom_text(data = r2_nondisadv_stats,
            aes(x = 20, y = 88, label = label),
            inherit.aes = FALSE, size = 3, colour = "grey30", hjust = 0) +
  scale_colour_manual(values = c("Other" = "#49a0c4",
                                 "Brighton and Hove" = "#e16fca")) +
  scale_alpha_manual(values = c("Other" = 0.15,
                                "Brighton and Hove" = 0.9), guide = "none") +
  facet_wrap(~ year_label, ncol = 2) +
  labs(x = "Predicted ATT8 — Non-Disadvantaged",
       y = "Observed ATT8 — Non-Disadvantaged",
       colour = NULL) +
  theme_minimal() +
  theme(legend.position = "bottom",
        strip.text = element_text(face = "bold"))

# 4. Final Plotly conversion
ggplotly(p_nondisadv, tooltip = "text") %>%
  layout(
    legend = list(orientation = "h", x = 0.3, y = -0.15),
    margin = list(t = 60)
  )

9 Residual Geography

Show code

if ("gor_name" %in% names(pred_data) && sum(!is.na(pred_data$gor_name)) > 0) {

  region_resid <- pred_data %>%
    filter(!is.na(gor_name)) %>%
    group_by(gor_name) %>%
    summarise(
      mean_resid = mean(residual, na.rm = TRUE),
      median_resid = median(residual, na.rm = TRUE),
      n = n(),
      .groups = "drop"
    ) %>%
    arrange(mean_resid)

  ggplot(pred_data %>% filter(!is.na(gor_name)),
         aes(x = reorder(gor_name, residual, FUN = median), y = residual)) +
    geom_hline(yintercept = 0, linetype = "dashed", colour = "grey50") +
    geom_boxplot(fill = "#2e6260", alpha = 0.3, outlier.size = 0.3, outlier.alpha = 0.2) +
    coord_flip() +
    labs(x = NULL, y = "Residual (Observed - Predicted ATT8)",
         subtitle = "Positive = outperforming model; Negative = underperforming") +
    theme_pub
}

Figure 17: Distribution of model residuals by Government Office Region

10 Disadvantage Gap

Show code

if (all(c("ATT8SCR_FSM6CLA1A", "ATT8SCR_NFSM6CLA1A") %in% names(model_data))) {
  gap_data <- model_data %>%
    filter(!is.na(ATT8SCR_FSM6CLA1A), !is.na(ATT8SCR_NFSM6CLA1A)) %>%
    mutate(gap = ATT8SCR_NFSM6CLA1A - ATT8SCR_FSM6CLA1A)

  gap_summary <- gap_data %>%
    group_by(year_label) %>%
    summarise(
      mean_gap = mean(gap, na.rm = TRUE),
      median_gap = median(gap, na.rm = TRUE),
      q25 = quantile(gap, 0.25, na.rm = TRUE),
      q75 = quantile(gap, 0.75, na.rm = TRUE),
      .groups = "drop"
    )

  ggplot(gap_data, aes(x = year_label, y = gap)) +
    geom_boxplot(fill = "#4e3c56", alpha = 0.3, outlier.size = 0.3, outlier.alpha = 0.2) +
    geom_point(data = gap_summary, aes(y = mean_gap),
               colour = "#4e3c56", size = 3, shape = 18) +
    geom_hline(yintercept = 0, linetype = "dashed", colour = "grey50") +
    labs(x = "Academic Year",
         y = "ATT8 Gap (Non-Disadvantaged - Disadvantaged)",
         subtitle = "Diamond = mean gap; Box = IQR") +
    theme_pub
}

Figure 18: Disadvantage attainment gap by academic year

11 Appendix: Model Formulae

11.1 Panel Model (Analysis A)

All four years pooled with year_label as a random intercept:

log(ATT8SCR) ~ log(PTFSM6CLA1A) + log(PERCTOT) + log(PNUMEAL) +
  PTPRIORLO + ADMPOL_PT + gorard_segregation +
  remained_in_the_same_school +
  teachers_on_leadership_pay_range_percent +
  log(average_number_of_days_taken) +
  (1 | year_label) + (1 | OFSTEDRATING_1) + (1 | gor_name/LANAME)

11.2 Per-Year Model (Analysis B)

Separate model for each academic year (no year term):

log(ATT8SCR) ~ log(PTFSM6CLA1A) + log(PERCTOT) + log(PNUMEAL) +
  PTPRIORLO + ADMPOL_PT + gorard_segregation +
  remained_in_the_same_school +
  teachers_on_leadership_pay_range_percent +
  log(average_number_of_days_taken) +
  (1 | OFSTEDRATING_1) + (1 | gor_name/LANAME)

11.3 Core Panel Model (Analysis C)

Reduced specification using only the 5 predictors available for all 4 years (including 2024-25). Drops workforce and prior-attainment variables:

log(ATT8SCR) ~ log(PTFSM6CLA1A) + log(PERCTOT) + log(PNUMEAL) +
  ADMPOL_PT + gorard_segregation +
  (1 | year_label) + (1 | OFSTEDRATING_1) + (1 | gor_name/LANAME)

11.4 Core Per-Year Model (Analysis D)

Same 5 core fixed effects, separate model per year (no year term):

log(ATT8SCR) ~ log(PTFSM6CLA1A) + log(PERCTOT) + log(PNUMEAL) +
  ADMPOL_PT + gorard_segregation +
  (1 | OFSTEDRATING_1) + (1 | gor_name/LANAME)

11.5 Imputed Full Panel Model (Analysis E)

Same 9-predictor specification as Analysis A, but covering all 4 years (2024-25 uses carry-forward imputed workforce and prior-attainment values). This is the model used in the Shiny app and for Observed vs Predicted plots.

log(ATT8SCR) ~ log(PTFSM6CLA1A) + log(PERCTOT) + log(PNUMEAL) +
  PTPRIORLO + ADMPOL_PT + gorard_segregation +
  remained_in_the_same_school +
  teachers_on_leadership_pay_range_percent +
  log(average_number_of_days_taken) +
  (1 | year_label) + (1 | OFSTEDRATING_1) + (1 | gor_name/LANAME)

11.6 Variable Definitions

Show code

model_vars <- c(
  "ATT8SCR", "ATT8SCR_FSM6CLA1A", "ATT8SCR_NFSM6CLA1A",
  "PTFSM6CLA1A", "PERCTOT", "PNUMEAL", "PTPRIORLO",
  "ADMPOL_PT", "gorard_segregation",
  "remained_in_the_same_school",
  "teachers_on_leadership_pay_range_percent",
  "average_number_of_days_taken",
  "OFSTEDRATING_1", "gor_name", "LANAME", "year_label"
)

tibble(
  `Variable Code` = model_vars,
  Description = var_labels(model_vars),
  `In Formula As` = case_when(
    model_vars %in% c("PTFSM6CLA1A", "PERCTOT", "PNUMEAL",
                       "remained_in_the_same_school",
                       "average_number_of_days_taken",
                       "ATT8SCR", "ATT8SCR_FSM6CLA1A", "ATT8SCR_NFSM6CLA1A") ~
      paste0("log(", model_vars, ")"),
    model_vars %in% c("OFSTEDRATING_1", "gor_name", "LANAME", "year_label") ~
      paste0("(1 | ", model_vars, ")"),
    model_vars == "ADMPOL_PT" ~
      "Two dummy variables (treatment-coded): ADMPOL_PTOTHER NON SEL, ADMPOL_PTSEL (ref: non-sel. in highly sel. area)",
    TRUE ~ model_vars
  ),
  Source = case_when(
    model_vars %in% c("ATT8SCR", "ATT8SCR_FSM6CLA1A", "ATT8SCR_NFSM6CLA1A",
                       "PTFSM6CLA1A", "PTPRIORLO", "ADMPOL_PT") ~ "KS4 Performance Tables",
    model_vars %in% c("PERCTOT") ~ "Absence Statistics",
    model_vars %in% c("PNUMEAL") ~ "School Census",
    model_vars %in% c("OFSTEDRATING_1") ~ "Ofsted Inspections (harmonised)",
    model_vars %in% c("gorard_segregation") ~ "Derived (Gorard index)",
    model_vars %in% c("remained_in_the_same_school",
                       "teachers_on_leadership_pay_range_percent",
                       "average_number_of_days_taken") ~ "School Workforce Census",
    model_vars %in% c("gor_name", "LANAME") ~ "Edubase",
    model_vars == "year_label" ~ "Constructed",
    TRUE ~ ""
  )
) %>%
  kable(align = "llll") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 10)

Table 15: Variable definitions from DfE metadata

Variable Code	Description	In Formula As	Source
ATT8SCR	Avg. Attainment 8 score per pupil	log(ATT8SCR)	KS4 Performance Tables
ATT8SCR_FSM6CLA1A	Avg. Attainment 8 score (disadvantaged)	log(ATT8SCR_FSM6CLA1A)	KS4 Performance Tables
ATT8SCR_NFSM6CLA1A	Avg. Attainment 8 score (non-disadvantaged)	log(ATT8SCR_NFSM6CLA1A)	KS4 Performance Tables
PTFSM6CLA1A	% KS4 pupils who are disadvantaged	log(PTFSM6CLA1A)	KS4 Performance Tables
PERCTOT	% overall absence	log(PERCTOT)	Absence Statistics
PNUMEAL	% pupils with English not first language	log(PNUMEAL)	School Census
PTPRIORLO	% KS4 pupils with low prior attainment (KS2)	PTPRIORLO	KS4 Performance Tables
ADMPOL_PT	Admissions policy (new definition from 2019)	Two dummy variables (treatment-coded): ADMPOL_PTOTHER NON SEL, ADMPOL_PTSEL (ref: non-sel. in highly sel. area)	KS4 Performance Tables
gorard_segregation	Gorard segregation index	gorard_segregation	Derived (Gorard index)
remained_in_the_same_school	Teachers remaining in same school (FTE)	log(remained_in_the_same_school)	School Workforce Census
teachers_on_leadership_pay_range_percent	% teachers on leadership pay range	teachers_on_leadership_pay_range_percent	School Workforce Census
average_number_of_days_taken	Avg. teacher sickness days	log(average_number_of_days_taken)	School Workforce Census
OFSTEDRATING_1	Ofsted rating (harmonised)	(1 \| OFSTEDRATING_1)	Ofsted Inspections (harmonised)
gor_name	Government Office Region	(1 \| gor_name)	Edubase
LANAME	Local authority name	(1 \| LANAME)	Edubase
year_label	Academic year (short)	(1 \| year_label)	Constructed