guide

Data Sharing: A Guide to Anonymization

Check out the full Privacy Engineering Certification

For more on this topic, take the Course: Data Sharing

2 min read

Share this doc:

Storing and sharing data requires a thoughtful balance of privacy protection and business needs. Use this resource as a guide to consider your current practices and drive change in your organization.

Step 1: Take Stock Before You Share

Take stock of the data that you collect, store, and share. What types of data do you collect? How sensitive is that data? How precise is it?

Precision equates to identifiability. The more precise, or specific, data is... the more risk posed to the data owner if the data is compromised. There are two important relationships to keep in mind to protect data privacy.

Data precision should always have an INVERSE relationship with:

Access
and
Retention period

TL;DR: Precise data should be short-lived and less accessible. Aggregated data can be available to more people and stored for longer periods of time.

Step 2: Minimize and Coarsen Sensitive Data

Delete - or better yet never collect - sensitive data that you don't need.

Next, coarsen any remaining sensitive data that presents a privacy risk. Coarsening data means making it less precise using tactics like:

Replacing personally identifiable data with internal uniquely generated values, or values generated by a keyed pseudorandom function
Rounding values such as timestamps to be less specific
Converting or truncating coordinates such as GPS coordinates to represent a broader area

Remember:

You will always have to balance the need for data privacy with business outcomes. That means minimizing and coarsening data enough that it is sufficiently anonymous, while still being able to carry out the operations and analysis needed for your business.

Step 3: Measure Impact

Obfuscating and anonymizing data is a critical step in protecting data privacy... but it's important to understand how successful your efforts were.

Measuring the impact of anonymization techniques is more of an art than a science. K-Anonymity and L-Diversity are two techniques that will drive your ability to show value and impact to stakeholders in your organization.

K-Anonymity:

Attributes are suppressed until each row is identical with at least K-1 other rows

Best Practice: Target a K-Anonymity of 5

A K-Anonymity of 5 means that you will have obfuscated the data such that for each record, there will be at least 4 others that are indistinguishable from it, making that record less individually identifiable

L-Diversity:

Where K-Anonymity hides an individual in the crowd by ensuring that any quasi-identifier appears in at least K records, L-Diversity measures the diversity of sensitive attributes in each data bucket

Check out these resources to learn more about K-Anonymity and L-Diversity.

For more on this topic, take the Course:

1 Course

Data Sharing

You will learn to prioritize data minimization, anonymization, and channel segmentation to protect data in motion while ensuring it is available when and how it is needed. Finally, you will learn how to quantify the impact of your efforts to manage privacy risk.

Get Started