Bristol AI scoring of children: algorithm failure and the threat of bias

The Avon and Somerset Police, together with the Bristol City Council, have discontinued the use of two artificial intelligence models designed to assess the risk of crimes against children. The reason is critically low accuracy and complete opacity of the algorithms. Independent auditors were unable to find either the source code or the list of variables used, making verification of such systems impossible.
How the Data Was Collected: A "Big Bucket" of Information
The project was based on the Think Family Database, launched in 2016. It combined police and social data on residents: housing status, mental health, teenage pregnancies, school truancy, and even receipt of free meals. The information was collected without direct citizen consent, based on legal norms for information exchange between government agencies. One police specialist candidly described the approach as "mixing everything into a big bucket."
Machine learning models were built on this database, assigning risk scores to adults and children. Journalists are aware of at least 23 such models—from predicting burglaries to the likelihood of becoming a victim of domestic violence. In parallel, the Offender Management App was operating, which a senior officer called a "leaderboard" of the most dangerous criminals.
Why the Algorithms Failed
The model for assessing the risk of crimes against children used data from the police, the city council, and the charity Barnardo's, including anonymized data on 1,000 children who had already been victims of such crimes. The scoring was influenced by a child's status as needing help, persistent school truancy, and mental health issues. Another model considered housing support, rent arrears, and free school meals.
As early as 2016, the police ethics committee warned of the risk of algorithmic bias. Later, the consulting firm Social Finance called the risk scoring the weakest element of the project. Low accuracy undermined the practical value of the models. By the time of the audit, both systems were no longer in use.
The quality of the models deteriorated due to changes in the dataset. The police tried to scale the approach to the entire region but could not agree on data sharing with all local councils. As a result, the models retained primarily a police "core" without social indicators. City service employees complained that vulnerable children were not captured in the results, and minor victims of crimes received lower scores than individuals involved in theft cases.
Audit: Low Accuracy and Lost Documents
The auditing firm Eticas, after analyzing over 36,000 performance evaluations, concluded that most models had low positive predictive accuracy. The system erroneously flagged a significant proportion of people as high risk. For example, a model designed to identify potential burglars showed accuracy below 10% for over three years: fewer than one in ten people flagged by the system actually committed such a crime.
Neither the police nor the Bristol City Council had retained documents by June 2023 regarding the decision to abandon the two models for assessing risks of crimes against children. The source code and variable list could not be found. Authorities now only use the NEET risk model—an assessment of the likelihood that a child will not be in education, employment, or training after school.
Context: PoliceAI and Systemic Risks
This story unfolds against the backdrop of the launch of PoliceAI—a national center for testing AI tools for 43 police forces in England and Wales with a budget of £75 million. The incident in Bristol clearly demonstrates that the risks of such models are related not only to algorithmic accuracy but also to data quality, documentation retention, and the possibility of independent verification.
Expert Opinion. The Bristol case is a classic example of how rushing to implement AI in law enforcement without proper auditing and transparency can discredit the very idea. When a system cannot distinguish a crime victim from a potential criminal, it is not just a technical error but a direct threat to justice. Until regulators introduce mandatory verification standards for such algorithms, similar failures will recur.