Evaluating the accuracy of our primary SATs predictions (2026)

June 9, 2026

Prediction Accuracy

Curriculum & Assessment

Primary Schools

At the heart of Smartgrade is a standardisation engine. We take a range of assessments - including partner assessments, publicly available tests and assessments we’ve created, and then apply our own standardisation algorithms to generate percentile ranks and standardised scores. For primary assessments, we now also calculate what we call performance indicators (BLW, WTS, EXS and HS) and scaled score indicators (using an 80-120 scale, where 100 correlates to the KS2 expected standard, and 110 is the higher standard). These look at the DfE’s mappings of percentiles to scaled scores which they release annually, and use these mappings to infer an indicator from our percentile rank.

In 2024 we did a piece of analysis to look at the accuracy of our predictions across a popular set of assessments that we run: Key Stage 2 practice SATs. We thought it was high time we did this again - and this time, we brought in Owen Carter, an education data expert, as an independent consultant to improve the analysis and improve the methodology. We also shared some year 5 reading comprehension assessment data with Owen.

What data did we use to perform the analysis?

Before we get to the findings, we should explain a little more about our Key Stage 2 Practice SATs package and the methodology we used.

Schools who use our practice SATs package get access to maths, reading comprehension and GPS assessments in 4 different windows across 4 half terms - Autumn 1, Autumn 2, Spring 1 and Spring 2 - each using a different real SATs past paper. (We also now offer year 5 baseline assessments, though this was not part of our analysis review). Schools around the country sit the relevant SATs papers and enter their data into Smartgrade. We then provide them with our standardised grades, including the performance indicators and scaled score indicators. (We also provide a bunch of additional analysis at question and topic level, as well as aggregations at class, school and MAT level.)

Our methodology involved taking our predicted results and matching them with actuals that were kindly provided by one of our MAT partners. We ended up with sample sizes of 4,000+ students for each assessment, so we’re confident that we have a robust dataset from which to infer accuracy.

So how accurate are Smartgrade’s predictions?

The good news is that we’re pretty accurate! To quote Owen’s report, “The analysis presented in this report indicates that Smartgrade’s prediction model is performing well”. Here are some of the key findings from the analysis:

The number of students who achieved at or above the level of the performance indicator we predicted was 87%. The per-assessment rates ranged between 83% (Autumn 1 GPS) and 93% (Spring 2 Mathematics). We tend to focus on this metric because our methodology is designed to err on the side of pessimism: the thing we most want to avoid is predicting overly positive outcomes. 99% of students were within 1 performance indicator band of their prediction.
The mean scaled score variance was -0.8. This means that on average, our assessments were within 1 scaled score point of the actual outcome. The Mean Average Error was 3.78, meaning that the average child received a scaled score prediction that was within 4 scaled points of their achieved grade.
Pearson’s R across all assessments was 0.82. This is a statistical measure that measures the strength of correlation between two variables - in other words, in this case, it shows whether predicted and actual scores move together across pupils as you would expect if the predictions are working well. There were variations between subjects, but all subjects showed a strong correlation: Mathematics was 0.85, GPS was 0.84 and Reading was 0.78. As a comparator, the EEF’s 2018 report on standardised test accuracy found that the correlation between the commercial tests and Key Stage 2 mathematics is typically between 0.7 and 0.8, and for reading typically between 0.6 and 0.7. So our standardisations here are consistently outperforming a wide range of commercially available standardised assessments.
Maths provides the most reliable predictions, while reading comprehension is the least reliable of the subjects analysed. This isn’t surprising to us - we found something similar in 2024 and it mirrors the findings of the EEF study cited above. Still, it is an important thing to bear in mind when looking at our - and indeed any - primary assessment. To be specific, in mathematics, nearly two-thirds of students achieved a result that was within 3 scaled score points of their achieved grade. For reading, it was more like half of students.

‍

What more can we do?

So overall, it was good news, and the accuracy this time around reflected quite closely what we found in 2024. That said, it’s worth taking a moment to note a couple of areas where we think we can improve:

There were some variances when performance was analysed at demographic level. The report notes that “while demographic variation in prediction accuracy is generally small, it does indicate a generally lower level of accuracy for disadvantaged students or those with additional needs”. For this reason, we’re tweaking our standardisation approach for September 2026 onwards to include demographic considerations when weighting our standardisation samples.

Reading comprehension accuracy is consistently lower than in Maths and GPS. While this is clearly an issue that stretches beyond Smartgrade, as the same phenomenon is documented in the EEF’s report on commercial standardised assessments, we want to help schools be more precise in how they assess Reading. We’re therefore doing two things within Smartgrade to assist with this:

We’re making it possible to average results easily from multiple assessment windows. One way we think we can improve Reading Comprehension prediction accuracy is to look at the average of the most recent two or three assessment results, rather than just looking at the most recent window. So we’re introducing a report that will make it easy to view these predictions in Smartgrade.
We’re introducing a Reading Fluency assessment. While this won’t change the accuracy of our Reading Comprehension predictions, it will provide some incredibly useful diagnostic information about children’s reading ability, which we think will help to provide a more complete picture of the individual and their needs. More information about our Reading Fluency plans can be found in our Smartgrade Assessment Series launch blog.

‍

We'd love to show you around Smartgrade - book a 30 minute demo with our sales team.