Data ju-jitsu: contextual attainment vs progress measures. Fight!

If you’ve read any of my blog posts over the years, then you probably know that I’m slightly obsessed with progress measures. My opinion is that progress measures break assessment – they make schools do weird things with data. For the sake of convenience, we assume a child’s development is linear, and buy-in therefore requires a total suspension of disbelief. National curriculum levels were fine as a broad indicator of attainment, and gave some notion of progress, but the pressure to show progress over shorter and shorter periods resulted in their fracturing, first into sublevels and then into sub-sub-levels. In the end, we were having to pretend that we knew the difference between a 2B and a 2B+.

We ended up in the same situation with P-scales, as noted by the 2016 Rochford Review:

P scales presume linear progression; they presume that pupils will move on from one skill or concept to a more challenging or advanced skill or concept in a linear fashion. As with national curriculum levels, there is an in-built incentive for schools to encourage progression onto the next P level.

The pressure to use P-scales to measure pupil progress caused that system to become broken beyond repair, and they went the way of levels.

Next, it was Development Matters. The old framework stated:

Children develop at their own rates, and in their own ways. The development statements and their order should not be taken as necessary steps for individual children. They should not be used as checklists. The age/stage bands overlap because these are not fixed age boundaries but suggest a typical range of development.

But for years, Development Matters’ age/stage bands had been used to measure pupil progress. Too broad to provide any sense of progression over a year, let alone a term, they became split into sublevels (30-50 Emerging/ Developing/ Secure etc) and each was treated as a discrete increment along the scale. This was worse than the subdivision of levels because it ignored the deliberate overlap between the bands. It took some serious mathematical ju-jitsu to believe that 40-60 Emerging was higher than 30-50 Secure – isn’t 40 lower than 50?

These approaches are still very much alive. Levels have been replaced with new metrics that look suspiciously similar to the old ones, and many schools evidently still feel the need to use teacher assessment to measure progress each term (or are pressured to do so!). This seems to be especially the case when it comes to SEND where schools want to ‘show the small steps of progress’. And secondary schools love their flightpaths where pupils get a subdivided GCSE grade (4-, 5=, 6+ etc) which either indicates what grade they are working towards or, worse, what grade that are currently working at. This will then be compared to the target line to give a sense of value-added.

There are two problems here:

Teacher assessment is not standardised. You can’t use human opinion to measure something. If I’m putting up shelves, I don’t guess the width of the alcove; I use a standardised tool for the job. In that scenario, I’d use a tape measure; to measure progress, the equivalent to a tape measure is a standardised test.
No measure of progress we attempt to make in schools will match what the government calculates for the school performance tables. As the Primary Accountability Guidance states: Schools should not try to predict pupil or school level progress scores in advance of official provisional data being available each September.

Under Pressure

It is the pressure of accountability that causes schools to invent systems that bend assessment data beyond its elastic limit until it snaps. At that point, the decision is taken to ditch that particular system. We then start again and repeat the same mistakes. I have long thought that one option is to ditch progress measures from the accountability system and concentrate on attainment, especially in the primary phase where the focus should be on pupils reaching expected standards, not on making expected progress. I understand that this is controversial because progress measures are seen to be fairer, but we have done without progress measures for 4 out of the last 6 years and the sky didn’t fall on our heads. Also, unlike Progress 8, which involves standardised tests at each end, primary progress measures are heavily reliant on teacher assessment, especially for the generation of the baseline where the use of tests would be inappropriate.

There is a paradox in primary education: everyone wants a progress measure, but no one wants a baseline. The Reception Baseline is extremely unpopular – it is time-consuming, tells teachers little that they didn’t already know, and a move to online has been marred by technical problems. The DfE considered using the outcomes of EYFSP, phonics, and MTC assessments as an alternative baseline for those cohorts that were missing KS1 results due to the pandemic but concluded ‘that these all had statistical issues that made them unsuitable‘.

So, if we don’t want progress measures because of the need for a baseline (which is likely to have statistical issues) and we are concerned that pure attainment measures will unfairly affect schools with the highest levels of disadvantage, what other option is there?

Contextual Attainment Comparisons

In amongst the documents to read for this week’s governors’ meeting was a new DfE report titled ‘Compare your good level of development (GLD) data‘. In it, the percentage of pupils reaching a Good Level of Development at the end of the Early Years Foundation Stage (EYFS) was compared to a ‘contextual GLD score’, which is defined as ‘an estimate of your GLD with certain cohort characteristics taken into account‘. The annex at the back of the report lists 14 contextual factors used to generate this estimate, with an intention to add another one in a future version of the model.

Essentially, the contextual GLD score is the result you might expect if the cohort, with its particular set of characteristics, did what is seen in the population as a whole. Let’s say, a school sees 70% of pupils reach a good level of development, but the DfE calculate that you’d typically expect 75% of such a cohort to achieve that level. The report would then show a difference of -5%. If that difference is bigger than the percentage value of one child in the cohort – which would be the case in a one form entry school where each pupil accounts for 3% – then the school’s result is deemed to be ‘lower than predicted’.

Another school, with higher levels of disadvantaged might have a GLD result of 65%, which is lower than the school above, but if their contextual GLD score is 60% then they may be classified as ‘better than predicted”. In all other cases, where the result is broadly in line with predicted – where the gap is less than the percentage value of a child – the school will be ‘at the predicted level‘. Notably, the actual and expected figures in the report are three-year averages, which have the added benefit of ironing out the spikiness seen in many primary school results.

I remember suggesting something similar a few years ago – a system of attainment comparisons based on the results of contextually similar schools – as an alternative to progress measures. I was told then, by someone who knows a lot more about this stuff than I do, that it has been tried, and the measures were not robust. But the cat’s out of the bag now. The DfE have produced contextual attainment measures for EYFSP and could do the same for phonics, MTC and – most importantly – KS2. Yes, there is a risk that schools with higher levels of disadvantage could be ‘excused’ for having lower levels of attainment, but that happens with progress measures anyway. And placing contextual attainment comparisons alongside standard measures of attainment – which are compared to national averages – would lessen that risk (e.g. the school does well in comparison to similar schools but is still below average). I don’t see this as being much different to the current system of progress vs attainment.

If the DfE were to implement this, it would do away with the need for an unpopular baseline assessment. It would also solve the problem of progress measures for non-all-through primary schools (infant, first, junior, and middle schools – around 3000 schools in England) who will be omitted from the progress aspect of the accountability system when the reception baseline kicks in in 2028. In the absence of any progress measures, those schools have been told that they will have ‘responsibility for evidencing progress using their own assessment information‘, thus creating a two tier system of accountability.

It’s not perfect, but contextual attainment comparisons would solve a lot of issues with school performance measures, and help to reduce workload in schools, especially in reception, which is currently the only year in the English state school system to have two statutory assessments.

And maybe it would help to stop schools doing weird things with data.

Blog

Data ju-jitsu: contextual attainment vs progress measures. Fight!

Latest Posts

Tracking the progress of pupils with SEND

What’s behind the data? The importance of writing moderation

Outcomes in KS2 Maths SATs by MTC score

What is a disadvantaged pupil?

Data ju-jitsu: contextual attainment vs progress measures. Fight!

Follow Insight

Insight Inform is brought to you by Insight; the UK’s leading system for monitoring pupil progress.

Comments

Leave a Reply Cancel reply