Due to the unprecedented COVID-19 pandemic, A-level exams in Britain were cancelled for the class of 2020. Grades would instead be determined by teachers’ rankings of pupils’ past work. Once submitted, the rankings would be adjusted by exam regulators, Ofqual, through an algorithm. This assessment would prevent students receiving incorrect grades due to unconscious biases.
The process ended up being a failure when 40% of students were downgraded from their teachers’ assessments, leading to thousands of students being rejected from universities and feeling a sense of injustice. In the ensuing weeks, many of them protested outside the Department of Education and Downing Street. Fortunately, their cries were heard and a U-turn was announced on August 17th where A-level grades would revert back to teacher assessments. But why did the algorithm fail in the first place, and what does this mean for the future of AI in public service?
Ofqual’s grading algorithm took into account four distinct criteria:
- The historical grade distribution at any given school between 2017-2019
- Predicted grade distribution based on the class’ GCSE scores
- Predicted grade distribution based previous years’ GCSE scores
- Whether historical data was available for the individual student
The list above demonstrates that it was not so much the algorithm that was to blame, but the people who designed it. The lack of inputs limits the algorithms ability to have a complete view of a student’s capabilities.
According to research by Kaili Rimfield at King’s College London, the best predictor of exam success is previous success in exams. The algorithm did take into account historical performance but at the cost of individual students. In particular, good pupils from poor-performing schools, the majority of which include students from lower backgrounds. This only exacerbates the educational equality gap. In fact, children in Scotland from poorer backgrounds were twice as likely to be downgraded than those from richer areas.
The biggest flaw with the algorithm is its last criteria of input. Historical data is less available from students in elite private schools than in public schools. If, for example, GCSE data was unavailable, then the algorithm would proceed with only historical A-level results. However, if both are available, then the algorithm will input both data sets. Students at low-income public schools are therefore more likely to have their grades altered.
This incident underlines the complications of implementing automated decision-making technologies in the public sector. It has certainly eroded public trust in the government’s use of such AI. Yet, people like Adrian Smith (Director and Chief Executive of The Alan Turing Institute) suggested that this fiasco need not hamper the growth of innovation. What is required is a human-led cultural and structural change that will see the equitable and accountable use of algorithms in the future.