Breaking Down Big Data: De-identification Standards to Protect Personal Information

big data

“Breaking Down Big Data” is a new column written by members of the ACC big data sub-committee. Here, they discuss how to manage the ever-changing big data issues in the legal field.


Individuals value their privacy. In contrast, businesses value the ability to leverage personal information to deliver quality products and services to meet the needs of their clients. The legal standards that regulate the protection of personal information help bridge the gap between these two opposing interests.

Generally, under these standards, businesses retain the ability to analyze large data sets and create information assets to support key business objectives. This is possible through a framework of conditions intended to protect and preserve the sensitivity of the personal information provided. This includes requirements to provide notice of the intended use of the data and offer individuals with the choice to move forward with disclosure before collection and use.

The first part of this series highlights the topic of de-identification, which is a technique required and employed by businesses to process personal information beyond typical regulatory constraints. Specifically, this article will address when de-identification may be applied, the legal standards under specific regulations for de-identifying personal information, and the effect meeting such de-identification standards has on the use of the remaining data set.

Obtaining the right to de-identify personal information

The ability to de-identify personal information is governed by statutes and contracts. For instance, under the Health Insurance Portability and Accountability Act (HIPAA), a business associate may use or disclose protected health information (PHI) as permitted by its business associate contract or as required by law. The business associate agreement must establish the permitted and required uses and disclosures of PHI, and should specify whether the business associate is permitted to de-identify PHI in accordance with 45 CFR 164.514(a)-(c).

In the context of the European Union’s General Data Protection Regulation (GDPR), anonymizing data is a form of data processing. Data processing requires obtaining unambiguous consent from data subjects before proceeding (an “opt-out” notice does not qualify), unless a different legal basis exists (i.e., such processing would be required in relation to a contract entered into by the data subject). Lastly, where no statutory guidance exists, many commercial agreements will limit the processing of customer data solely to the extent necessary to deliver the services as described under the respective agreement. As a result, vendors should take precautionary measures to ensure contract language is drafted broadly enough to account for processing of customer data for purposes of current or future data analytics offerings.

Overview of de-identification standards

De-identification occurs when an individual’s identity is no longer ascertainable or the risk of identifying an individual is significantly low due to the removal of direct personal identifiers (e.g., a data subject’s first and last name) and indirect identifiers (phone numbers, email addresses, etc.). Regulatory requirements dictate whether a given de-identification standard has been met and the effect that meeting such standard has on the use of the remaining data set. Below is a summary of de-identification standards for two of the most prominent data protection statutes in the United States, HIPAA and GLBA, as compared against the de-identification standard under GDPR.

Recommended best practices

To manage compliance with regulatory standards for de-identification of personal information, consider implementing the following best practices:

  • Engage key stakeholders to understand the business objectives supported by collecting and processing personal information, the scope of data being collected, the intended use, and the current methods employed to de-identify that data.
  • Identify regulations or other laws that govern the data processing activities, as well as any requirements for de-identification.
  • Review contracts to understand rights and limitations for processing of personal information, including those related to de-identification.
  • Assess currently applied practices to de-identify and use personal information against any regulatory and contractual restrictions and determine their suitability.
  • Ensure the privacy program addresses when consent is needed for data processing activities and engage with technical teams to ensure opt-in/ opt-out consent mechanisms are properly built within applications, as well as tracked and managed internally.
  • Identify “de-identification” as one of the intended purposes of collection in notices seeking consent.
  • Work with stakeholders to build in a workflow process where individuals charged with the corporation’s compliance with privacy restrictions are notified of newly intended uses or collection of personal information (a “privacy by design” approach).