Against Face Analysis

The use of mugshots in research, and the shift from face recognition to face attribute classification.

Kyle McDonald


In 2018 I read a research paper about estimating body mass index from face photos, written by an insurance company. The paper had a diagram, a mosaic of photos of mostly Black men showing that the training data was built on mugshots taken by the Florida department of corrections. I’ve been making art about face analysis since 2009, and I was generally familiar with the long history connecting it to policing, and that it is deployed in a way that reinforces systemic racism and mass incarceration. I knew mugshots were easily accessible. But I didn’t know that mugshots were used to build these systems, and this seemed like another layer of abuse. I decided to dig deeper, with support from the Eyebeam Rapid Response fellowship.

A grid of blurred mugshots showing five examples of of five body mass index categories from underweight to severely obese.

What I learned is that there is a feedback loop connecting face research, surveillance, and mass incarceration. Researchers build tech for face analysis, surveillance companies sell the tech to police, and police use it to justify arrests. Finally, the mugshots are used by the researchers to improve their algorithms or by the government to test the accuracy of their systems. This is just one small piece of a complex system. But in some ways it’s reminiscent of the establishment of the Florida prison system itself, which was designed to have prison labor cover all prison expenses.

I decided to focus on facial attribute classification in my research. While face detection locates a face (drawing a box around it), and face recognition names a face (“Kyle McDonald”), facial attribute classification describes a face (“white, male, 30s”). As we fight for new regulations across the country, face recognition is becoming a less viable option for many police and government agencies. And I’ve noticed that surveillance companies are pivoting around the controversy: they describe facial attribute classification as going “beyond identity” and handling “the people who are not in the database” where face recognition fails. They make a point of their products “not hosting identities”. The companies focused on facial attribute classification try to make it clear that they do not provide facial recognition. Surveillance companies like Qognify, Vintra, BriefCam, Milestone, Veritone, CloudWalk, and others are aware of the problems with racial bias in policing (in stops, searches, use of force, and beyond), and say they are trying to build systems where “human involvement is no longer required”, or that they want to provide “identification of potential threats without bias”, that “are less impacted by implicit biases and emotional responses”. Even outside face analysis in the context of predictive policing, companies like PredPol make a point to say they are “not profiling individuals” only providing “accurate prediction of crime location & time”.

Marketing diagram showing a progression from “human in the loop” to “human on the loop” to “human out of the loop”.
Source: Veritone

As I was researching the way this tech is developed and deployed, I came back to thinking about the people in these mugshots. I wanted to know how it’s possible for their data to be so easily accessed and misused. I found that across the country it’s incredibly easy to download mugshots and very personal information. Florida in particular has a one-click download for a 2GB database dump that is updated once a month that can be used to download half a million mugshots. I found that most states have easy-to-scrape access to this kind of data, totaling over 11M records.

Mostly blurred spreadsheet, with a list of states and totals on left side, ranging from thousands to hundreds of thousands.
Screenshot of research into how mugshots are stored and accessed.

Mass incarceration is complex, and I’ve been learning a lot over this fellowship, but what’s clear is that the most vulnerable people should not be treated as training data. This abuse can be understood as a natural extension of the many ways in which incarcerated people regularly have their basic human rights violated (especially solitary confinement). What I’m trying to do with my work is find some space for creative justice.

Reflecting on Angela Davis’ observation that “it’s so easy to just forget, to think about the prison and its population abstractly”, I went back to the body mass index paper. I wanted to understand: where are the researchers coming from? Who are the people in the mugshots?

I decided to inhabit the role of the researcher. I want to replicate the method of research without replicating the violence of the research. To me, that means downloading and processing all half million mugshots from the Florida Department of Corrections, but not publishing a research paper, not sharing a trained model, and not selling my work to be weaponized by an insurance company or police department. Through this process, I learn about all the things the researchers will never write. Some of it is banal, like the typos in the database, or the duplicates and glitchy photos that consistently crash computer vision scripts.

But I also learn that there are so many more pieces of information besides height and weight: there is sentence length, aliases, tattoos, charges and convictions, race, gender, age, even intended address of residence after release. And any of these things could be predicted just as easily as body mass index by modifying one or two lines of code. Whether the prediction is “accurate” is completely immaterial. There are hardly any regulations against these predictions, even in cases where face recognition is banned. This direction falls perfectly in line with the new interest in facial attribute classification.

I came back to the diagram that drew me in initially: that 5x5 grid of mugshots. In this diagram, all of their eyes are barely covered with a thin black bar. After spending some time with these images abstracted as “data”, I realized that this bar is not intended to protect the identity of the people in the mugshots at all. It feels clear the black bar is designed to protect the researcher from having to meet the gaze of the people that they are exploiting as data.

I came up with a plan to remove the black bar, to push back and steal the photos away from the domain of abstract quantification. I built a face recognition system for all half million photos that would help me map each mugshot to a person. Essentially a surveillance system in miniature, using the same tools and the same data but removed from the context of institutional power.

Diagram of training process for a face recognition algorithm, showing data flowing through layers with annotated equations.
Diagram of face recognition architecture from insightface

With a chance to interrogate the system directly, I could see examples of face recognition confusing two people for each other. I could see that any practical face recognition system always finds a “match”, even when the target identity is not in the dataset. When I finally de-anonymized all the mugshots in the paper, I realized another sad and disturbing detail: four of the 25 people had been released, and five had passed away while incarcerated. This meant that a large percentage of the researcher’s training data came from people who had already served their time but were still in the database, and perhaps a hundred thousand mugshots were photos of people who had passed away.

These systems are sold to us as pure, objective algorithms for improving society. The truth is that they are built on violence. The violence of the police and carceral system that is an extension of slavery. They are built on the erasure of dignity and human rights of incarcerated people, even after serving their time or dying. They become a dataset for researchers that do not understand or acknowledge their complicity, and their work reinforces the institutional racism that makes these algorithms possible.

What can I do as an artist that can bring justice to this? Can I subvert and misuse these tools in a way that dismantles the system, or at least puts a dent in it, instead of simply reinforcing it? I am very open to dialogue and critique, and I would appreciate feedback as this work continues to develop.

Thanks to Eyebeam for this incredible fellowship, and to the people who have spent time meeting with me, advising, and brainstorming. Especially Dillon Sung, Adam Harvey, Caspar Sonnen, Cara Oba, Kyle Oba, A.M. Darke, and the Media As Socio-Technical Systems group at USC.