Research

From Measurement to Mitigation: Quantifying and Reducing Identity Leakage

We present a comprehensive study of identity leakage in visual embeddings and introduce Identity Sanitization Projection (ISP) as an effective mitigation. This work takes a step toward making powerful vision models more privacy-friendly, which is crucial for real-world deployment.

Daniel GeorgeCharles YehDaniel LeeYifei Zhang

Explore a simplified, interactive walkthrough of the paper.Or, read the full paper here.

In this paper

Introduction

Organizations increasingly rely on visual embeddings, numerical representations of images that preserve semantic similarity. For example, retailers and marketplaces use embeddings to spot duplicate listings, catch policy violations, and power visual search (“show me similar products”). Social and media companies use them for near-duplicate detection, content clustering, and integrity checks around reuse or manipulation. Banks and identity vendors use embeddings of document and selfie images for fraud prevention.

Facial information inevitably ends up in the mix of the data that's fed into these embeddings. But unlike facial recognition embeddings, the embeddings that power non-facial recognition tasks are generally trained to respect privacy and are not intentionally trained to represent faces well.

Until recently, the strength of this assumption wasn't tested, especially under the conditions that an adversary would actually face. If an adversary got hold of these embeddings through a breach, could they uniquely identify someone? Would they be able to reconstruct a recognizable face from the embedding alone?

In this study, we consider how much someone can infer about identity if they only have access to a non-facial embedding. While prior work often looked at face-recognition models or used weaker evaluation settings, we focus on attacker-aligned metrics:

Verification at very low false-accept rates, mirroring the standards of real biometric deployments
Open-set evaluation, which tests identities that weren't seen during training
Inversion attempts, or when an attacker tries to reconstruct a face from an embedding

From selfie to visual embedding

Frozen models turn each image into a high-dimensional visual embedding. Different photos of the same person are clustered together in the embedding space. The clustering could make identity recoverable from embeddings, and removing identity-correlated elements could lessen this risk.

Person A16 selfies

Person B16 selfies

Person C16 selfies

Person D16 selfies

0.62

−0.31

0.08

…

−0.14

0.57

0.22

…

0.41

0.19

−0.45

…

−0.27

0.04

0.66

…

Methodology

Our experiments use two common benchmark datasets, CelebA-20 and VGGFace2-20. Each dataset has 20 images from 480 different people (9,600 images total). We split into three groups to avoid overlap between training, validation, and testing.

320 people for training: We use this pool to fit anything that must learn from labeled faces. For example, we build the Identity Sanitization Projection (ISP) described below by computing per-person means on these identities. Test identities never appear here, so the attacker never “sees” the people we later score.
80 people for validation: We use this pool to choose operating points and configurations without contaminating the final scorecard. For example, when we set the acceptance threshold to achieve a false-acceptance rate (FAR) near 10⁻⁴ for impostor pairs among the validation identities.
80 people for final testing: We only use this group for evaluation after other choices from validation are fixed. Using separate identities allows us to determine how well the methods work on individuals not seen in training or validation.

For each embedding, we use a pre-trained model with fixed weights (a “frozen” model) to map each image to a visual embedding.

Here's a closer look at the methodology for the experiments:

Few-shot identity probing: We simulate an attacker who must decide whether a new embedding belongs to the same person. The attacker targets a strict FAR, how often the verifier links embeddings from different people. It reports a true acceptance rate (TAR), or how often the verifier links the embeddings from the same identity. We simulate attacks with one, four, or 16 support embeddings.
Attribution: We want to know whether a model's notion of “same person” depends more on the face or on contextual factors, such as framing, clothing, or background.
Template inversion: We take a target embedding (a “template”) and ask whether an attacker can synthesize a portrait whose embedding matches the template without seeing the original photo.
Identity Sanitization Projection (ISP): We collect embeddings from a labeled dataset, compute the average embedding per person, and find how the per-person averages differ. Then, we remove potentially identifying components and figure out if the new embedding is still useful for non-biometric search and classification.

Removing potentially identifying components

The paper's identifying subspace spans many dimensions. We attempt to remove components from potentially identifying areas of the subspace (represented here by the x and y axes) while preserving functionality.

← Raw embeddingsEmbeddings after ISP →

Four identities ·

16 samples / identity

Experiments and results

Our experiments put the methodology described above into action. We start with few-shot open-set probing at low FAR, then stress nonlinear probes, utility on downstream tasks, attribution, and template inversion. Together, our experiments connect the measurement of how much identity information each model leaks to our mitigation efforts with ISP.

Few-shot identity probing

Few-shot probing asks a straightforward question: if someone trains a lightweight verifier on one set of people and then verifies new people, how strong is the identity signal at a strict false-acceptance rate (FAR)?

To do so, over all embeddings, we attempt to train a verifier to identify embeddings from the same person. To align with how biometric verification is generally implemented, we report TAR at a fixed FAR (10⁻⁴).

Try it yourself

Pick a dataset (CelebA-20 or VGGFace2-20) and choose a projection (the measures we propose for separating identity from embeddings).

Raw: Embeddings taken directly from the model without any privacy-preserving measures.
ISP-W: Fit using identities from the same dataset you are evaluating (W = within dataset).
ISP-X: Fit using identities from the other dataset (X = cross dataset).

The chart shows the results for k = 1, 4, and 16, representing the number of images per identity that the attacker can use when training or conditioning a verifier to determine if two embeddings come from the same person. The more embeddings of someone the attacker has (a higher k), the stronger their verifier will be.

You'll see results for each embedding model. A higher TAR on non-FR embeddings indicates more identifying information is accessible.

Dataset

Projection

TAR results and the change from raw results

Open-set few-shot probe: true acceptance rate percent at FAR 10 to the minus 4. Columns are models; rows are support size k. When projection is not Raw, each cell includes the delta in percentage points versus Raw embeddings for that model and k. Matches the grouped bar chart for the selected dataset and projection.
k	DINOv2	DINOv3	CLIP	SSCD	ArcFace	AdaFace
k = 1	4.5%	4.5%	16.4%	6.6%	93.7%	93.6%
k = 4	5.5%	6.7%	19.7%	8.4%	94.0%	94.0%
k = 16	5.7%	6.8%	19.8%	9.8%	94.0%	94.2%

How to read the results

A few patterns emerge in the results:

Raw: Dedicated FR models result in a high TAR at low FAR by design. A linear verifier already has a hard time pulling identity out of the non-FR models, and false acceptance is modest for DINO/SSCD and somewhat higher for CLIP.
ISP within a dataset: For non-FR models, the acceptance rate falls to a few percent on both datasets and is near random for a strict matcher, indicating that ISP removes identity-relevant information.
Cross-dataset ISP: A projector fit on one dataset works about as well on the other dataset, indicating that the information removed isn't tied to the specific people used to fit the projector.

The results also look similar whether the attacker receives 1 or 16 support images, which is a good sign for protecting against few-shot and many-shot attackers.

Utility retention for non-facial tasks

Low TAR means the attacker struggles to verify identity after ISP, but that doesn't mean that the embedding's performance on other tasks doesn't change. Ideally, we're able to remove identity information while preserving non-biometric use. To measure this impact, we evaluate ISP-projected embeddings on ImageNet classification (frozen k-NN and linear probe), tasks that they're intended to perform well on.

k-NN Top-1 before and after ISP

Context attribution

Modern models can entangle identity evidence with non-facial context (backgrounds, clothing, hair, scene, etc.), a phenomenon widely studied in object recognition as “context bias” or “shortcut” reliance.

In biometrics, explainable AI work has primarily focused on where a face recognition (FR) model looks rather than quantifying the relative importance of face vs. context. Our attribution-as-measurement framework addresses this gap with:

Face–context attribution (FCR): We resize or crop the image so the face takes up a similar fraction of the image for every photo. This helps us fairly measure the impact of changing part of the face or background.
Face Importance Index (FII): We obstruct parts of the face and then an equal part of the background to measure the impact on the calculated similarity between the original and obscured image. FII helps us understand whether obstructing the face hurts similarity more, less, or the same as obstructing the background.
Context Preference Index (CPI): For each image, we construct two reference images: an identity-matched image (same person, different context) and a context-matched image (different person via face swap, same context). We apply a Gaussian blur to the face in all three images, and CPI measures whether a model considers the identity- or context-matching image to be a closer match to the reference image as the amount of blur changes.
Background revelation threshold (B^*): As a stress test, we reveal more of the background bit by bit while the face crop stays fixed. B^* is how much background has to appear before the context-matched image beats the identity-matched one in similarity.

We find that identity-focused FR baselines are face-dominant (obstructing the face hurts most, and they resist background revelation), whereas non-FR models inside tight crops are context-dominant under the stress test (background can outweigh face when forced).

After ISP, the most conspicuous change is in context preference. Models that previously over-relied on same-context signals rebalance toward same-person preference. Face-obstruction sensitivity either decreases or remains low, indicating that ISP removes face-aligned identity evidence without creating brittle behavior elsewhere.

As a result, we found that the diagnostics separate FR and non-FR models under open-set protocols and remain stable across pairing strategies.

Template inversion

Template inversion tests whether the vector carries enough recoverable facial identity that a strong generative model can “paint” a convincing face back. Or, in plain English, if someone steals an embedding, can they recreate an image of a face that passes a selfie similarity test?

To do this, we give an attacker a target embedding vector (a “template”) and a model. The attacker does not see the real photo that produced the template, and they try to recreate a facial image from the embedding.

No optimizer covers every failure mode, so we evaluate four attack families: DiffMI (diffusion-based), Vec2Face (regression-based), Bob (score-based), and ALSUV (latent-optimization-based). We use matched resource budgets and a fixed operating point, and an attack is successful if the synthetic image verifies against the real target.

We find that applying ISP does not materially increase or decrease inversion success for non-FR models. However, the empirical evidence suggests that the identity signal in non-FR embeddings is too weak for current inversion methods to operationalize before and after projection.

Limitations and future work

Our study is restricted to frozen models and linear susceptibility, which has several limitations:

Our identity subspace estimates rely on labeled identities and may be sensitive to demographic imbalance or domain mismatch, though our cross-dataset transfer results suggest good robustness.
ISP only targets the between-class mean structure, and higher-order identity cues may remain in the complementary space.
Template inversion depends on the attack's generative model, optimization, and compute budget, so a low verification rate in our results does not prove that embeddings are safe or that no future method could succeed.

Future work should evaluate privacy under black-box and gradient-free adversaries, extend ISP with iterative or non-linear variants that retain its auditability, and explore training-time integration to obtain models that are privacy-preserving by design.

Additionally, scaling our attribution and inversion audits to broader datasets and sensitive attributes beyond identity remains an important direction for deployment-ready privacy assessments.

Conclusion

We presented a comprehensive study of identity leakage in visual embeddings and introduced Identity Sanitization Projection (ISP) as an effective mitigation.

Focusing on TAR at low-FAR, inversion success, face-coverage ratio (FCR), context preference index (CPI), and background revelation threshold (B^*) allows us to quantify identity leakage and show that models like CLIP and DINOv2 expose minimal accessibility of identity.

With ISP, we further dampen identity leakage and offer substantial privacy gains with minimal impact on the embeddings' usefulness for other tasks. We also provided theoretical justification and practical evidence for ISP's effectiveness.

This work takes a step toward making powerful vision models more privacy-friendly, which is crucial for real-world deployment. By open-sourcing our evaluation toolkit and encouraging others to adopt privacy metrics, we hope to spur the development of even more advanced mitigation techniques.

Ultimately, we envision that non-FR embeddings can retain their general utility while offering guarantees that using them will not compromise individuals' privacy.

You can find the full paper, references, and appendices on arXiv.