Can a Model Be Differentially Private and Fair?

Training models with differential privacy stops models from inadvertently leaking sensitive data, but there's an unexpected side-effect: reduced accuracy on underrepresented subgroups.

Imagine you want to use machine learning to suggest new bands to listen to. You could do this by having lots of people list their favorite bands and using them to train a model. The trained model might be quite useful and fun, but if someone pokes and prods at the model in just the right way, they could extract the music preferences of someone whose data was used to train the model. Other kinds of models are potentially vulnerable; credit card numbers have been pulled out of language models and actual faces reconstructed from image models.

Training with differential privacy limits the information about any one data point that is extractable but in some cases there’s an unexpected side-effect: reduced accuracy with underrepresented subgroups disparately impacted.

Recall that machine learning models are typically trained with gradient descent, a series of small steps taken to minimize an error function. To show how a model can leak its training data, we’ve trained two simple models to separate red and blue dots using two simple datasets that differ in one way: a single isolated data point in the upper left has been switched from red to blue.

Notice that the two models have very different boundary lines near the isolated point by the end of the training. Someone with access to the trained model might be able to infer if the point in the upper left is red or blue — if the color represented sensitive information, like someone’s voting record, that could be quite bad!

Protecting the Privacy of Training Points

We can prevent a single data point from drastically altering the model by adding two operations to each training step:²

Try increasing the random noise below. We’re now training lots of differentially private models; the more the potential models for the red and blue outlier points overlap, the more plausible deniability the person in the upper left has.

You can also try dragging the other points around and adjusting the gradient clipping. Are points in the center or outliers more likely to modify the boundary lines? In two dimensions there’s a limited number of outliers, but in higher dimensions more points are outliers and much more information can be extracted from a trained model.

Correctly combined, adding gradient clipping and random noise to gradient descent make it possible to train a model with differential privacy – we can guarantee that a model trained on a given dataset is essentially indistinguishable from a model trained on the same dataset with a single point changed.

Predictions on Outliers Change the Most

What does this look like in practice? In Distribution Density, Tails, and Outliers in Machine Learning, a series of increasingly differentially private models were trained on MNIST digits. Every digit in the training set was ranked according to the highest level of privacy that correctly classified it.

On the lower left, you can see digits labeled as “3” in the training data that look more like a “2” and a “9”. They’re very different from the other “3”s in the training data so adding just a bit of privacy protection causes the model to no longer classify them as “3”. Under some specific circumstances, differential privacy can actually improve how well the model generalizes to data it wasn’t trained on by limiting the influence of spurious examples.

The right side shows more canonical digits which are classified correctly even with high levels of privacy because they’re quite similar to other digits in the training data.

The Accuracy Tradeoff

Limiting how much a model can learn from a single example does have a downside: it can also decrease the model’s accuracy. With 7,500 training points, 90% accuracy on MNIST digits is only achievable with an extremely low level of privacy protection; increasing privacy quickly lowers the model’s accuracy.

Collecting more training data offers a way out of this accuracy/privacy tradeoff. With 60,000 training points, 90% accuracy can be reached with a higher privacy level than almost all real-world deployments of differential privacy.

Looking at the differences between predictions by digit class shows another potential complication: some classes are harder to identify than others. Detecting an “8” with high confidence requires more training data and/or lower privacy than detecting a “0” with high confidence.

This problem is exacerbated if the training data has fewer examples of one class than the others. Trying to predict an uncommon event with a differentially private model can require an enormous amount of data.

Implications for Fairness

Outliers also aren’t evenly distributed within a class. Below, MNIST digits are colored by their sensitivity to higher privacy levels and projected with UMAP, forming several clusters of privacy-sensitive yellow digits. It’s possible to inadvertently train a model with good overall accuracy on a class but very low accuracy on a smaller group within the class.

There’s nothing that makes a “1” slanted to the left intrinsically harder to classify, but because there are only a few slanted “1”s in the training data it’s difficult to make a model that classifies them accurately without leaking information.

This disparate impact doesn’t just happen in datasets of differently drawn digits: increased levels of differential privacy in a range of image and language models disproportionality decreased accuracy on underrepresented subgroups. And adding differential privacy to a medical model reduced the influence of Black patients’ data on the model while increasing the influence of white patients’ data.

Lowering the privacy level might not help non-majoritarian data points either – they’re the ones most susceptible to having their information exposed. Again, escaping the accuracy/privacy tradeoff requires collecting more data – this time from underrepresented subgroups.

More Reading

There are deep connections between generalization, memorization and privacy that are still not well understood. Slightly changing the privacy constraints, for example, can create new options. If public, unlabeled data exists, a “Private Aggregation of Teacher Ensembles“ could be used instead of gradient clipping and random noise to train a differentially private model with a smaller disparate impact on accuracy.

Finding ways to increase privacy with a smaller impact on accuracy is an active area of research – model architectures designed with privacy in mind and better dataset cleaning look like promising avenues.

There are also additional accuracy/privacy/fairness tradeoffs beyond what’s discussed in this post. Even if a differentially private model doesn’t have large accuracy gaps between subgroups, enforcing fairness metrics can reduce privacy or accuracy.

This post focuses on protecting the privacy of individual data points. In practice more work might be necessary to ensure that the privacy of users – who could contribute much more than a single data point each – is also protected.

These questions are also significant outside of machine learning. Allocating resources based on a differentially private dataset – with no machine learning model involved – can also disproportionately affect different groups. The 2020 Census is the first to use differential privacy and this could have a wide range of impacts, including how congressional districts are drawn.


Adam Pearce // January 2022

Thanks to Abhradeep Thakurta, Andreas Terzis, Andy Coenen, Asma Ghandeharioun, Brendan McMahan, Ellen Jiang, Emily Reif, Fernanda Viégas, James Wexler, Kevin Robinson, Matthew Jagielski, Martin Wattenberg, Meredith Morris, Miguel Guevara, Nicolas Papernot and Nithum Thain for their help with this piece.


To speed up training at the cost of looser privacy bounds, gradients, clipping and noise can be calculated on a group of data points instead of individual data points.

The “ε” in ε-differential privacy essentially measures the overlap in two distributions after changing a single data point.

Clipping and noising are also used outside of differential privacy as regularization techniques to improve accuracy.

In addition to accidently mislabeled examples, differential privacy can also provide some protection against data poisoning attacks.

While visually similar digits aren’t necessarily interpreted in similar ways by the model, the clustering of visually similar digits in the UMAP diagram at the bottom of the page (which projects embedding from the penultimate layer of digit classifier) suggests there is a close connection here.

Rebalancing the dataset without collecting more data doesn’t avoid this privacy/accuracy tradeoff – upsampling the smaller class reduces privacy and downsampling the larger class reduces data and lowers accuracy.

See the appendix on Subgroup Size and Accuracy for more detail.

Appendix: Subgroup Size and Accuracy

How, exactly, does the amount of training data, the privacy level and the percentage of data from a subgroup impact accuracy? Using MNIST digits rotated 90° as a stand-in for a smaller subgroup, we can see how the accuracy of a series of simple models that classify “1”s and “7”s change based on these attributes.

On the far left, models without any rotated digits in the training data never classify those digits more accurately than random guessing. By rotating 5% of the training digits, a small slice of models with lots of training data and low privacy can accurately classify rotated digits.

Increasing the proportion of rotated digits to 10% or 20% or even more makes it possible to train a higher privacy model that performs well on both types of digits with the same amount of training data.

Click on one of the models above and you can see how the accuracy gap shifts as number of training points, privacy level and percentage of rotated digits are independently changed.

Intuitively, adding more training data has diminishing marginal increases to accuracy. Accuracy on the smaller group of rotated digits, which may just be on the cusp of being learned, falls off faster as the effective amount of training data is decreased — a disparate reduction in accuracy.

More Explorables