Data Protection

Why data anonymisation offers no guarantee of privacy protection

By Matt Lock, UK Technical Director at Varonis 

Recent EU GDPR regulations, in common with similar initiatives around the world, are intended to overhaul decades-old standards of protection and commercial accountability for consumer data.

The aim is to raise the bar to make them fit for today’s ever-connected, online world. The measures place greater emphasis on a person’s individual right to privacy. This includes requiring organisations to implement security measures that prevent personally identifiable information (PII) from being shared with third parties without their knowledge or consent.

Much of the attention thus far has centred on what happens when the customer data harvested by companies is used to support their sales and marketing campaigns.

Less discussed, perhaps, is the information routinely collected for research purposes. From shaping product development to advancing medical studies, research data is gathered purely for analysis. Researchers have no intention of ever contacting or involving individual participants.

In such circumstances, the data is meant to be anonymous. EU GDPR, for example, defines it as “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable”.

Data that is properly anonymised is outside the scope of GDPR or other such regulations. It leaves organisations free to continue to collect and analyse large data sets for research.

As well as excluding the data from the purview of requirements such as data subject access requests (DSARs), anonymisation protects businesses and individuals alike in the data set if the information is stolen or leaked. With no identifiable information, conventional wisdom dictates there is no need to notify anyone and no risk of legal or regulatory fines.

Anonymisation is not always watertight

Firms holding on to stores of anonymous data, however, cannot afford to be complacent. Recent research from Belgium’s Université Catholique de Louvain (UCLouvain) and Imperial College London found data can often be easily de-anonymised by cross-referencing it against other data sets. The study pointed out, for example, that a research-based dataset containing 15 demographic attributes could be used to identify individuals in Massachusetts with 99.98 accuracy.

Alarmingly, the smaller the dataset the easier it becomes to identify individual contributors. If you consider it is relatively straight forward to de-anonymise the data for an entire US state, the average town or city is a piece of cake by comparison.

Such risks are more than just theory. In one instance, the home addresses of New York cabbies were uncovered through an anonymous dataset of individual trips around the city. In another case, data that was supposed to have been anonymised by the Australian health department was easily reidentified by cross-referencing facts such as birth years and the number of children in the family.

The lesson here is that organisations who think data they have collected is anonymised and safe should think again because it may still be at risk. If there’s any chance of turning anonymised data back into PII it will once again be subject to GDPR and other regulations. Potentially, this means if data that was once anonymised is stolen or leaked in a security incident it would still constitute a technical regulatory breach.

Best data management practices

Although we usually hear of regulatory action in the aftermath of a security incident, it’s always worth bearing in mind that GDPR’s primary function governs how personal information is handled and secured.

British Airways for example, was officially fined for “poor cyber security arrangements”. An organisation can be the victim of a cyber-attack but not be punished by the regulators if it is deemed to have taken appropriate steps to secure the personal data in its care.

Aside from taking the proper security precautions, when it comes to handling personal data, organisations should also be able to demonstrate a strong case as to why they needed the information in the first place. Everything – from financial records to sales activity – on the system needs to be fully justified.

Over time organisations are often guilty of hoarding more data than they need. A recommended housekeeping technique for dealing with the build-up of digital clutter is to perform a data audit. It’s an exercise that allows companies to locate and classify sensitive information, regardless of whether it resides on-premises or in the cloud. Moreover, there are plenty of automation tools available that render the daunting task of identifying any stale or duplicate data quite straightforward.

To anonymise or not to anonymise?

When it comes to handling personal data, the safest approach is to always secure written consent from the individuals concerned. With this in place, the organisation should be clear to retain and internally share the data for as long as the agreement lasts.

Data anonymisation is not an excuse to dispense with consent. Frankly, any organisation that seeks to harvest data without appropriate authorisation is likely to be up to no good with it.

This means that firms should ensure they get full consent even in a pure research scenario where the chances of having to contact or identify people on an individual basis are negligible.

Risks associated with datasets being de-anonymised cannot be overstated. Firms need to think about how easy it might be for threat actors to de-anonymise their data with the help of publicly available information along with how serious it would be if it became public.

For example, an anonymised data set that displayed banal, harmless information such as preference of supermarket would pose minimal risk. On the other hand, sensitive information such as a medical history or criminal record presents a much greater concern.

For any businesses in doubt, the Information Commissioner’s Office provides plenty of advice on how best to use anonymisation to protect data for legitimate research purposes. In general terms, however, it is fine to store data in anonymised form provided it can be demonstrated that all reasonable steps have been taken to protect the privacy of sampled individuals.

About the author 

With 20 years’ cyber security experience, Matt Lock is an expert on data security and a regular speaker – and media commentator.  An accomplished CISSP Security Consultant, he’s worked with world-leading organisations across insurance, pharmaceuticals, legal, health, entertainment, retail and utilities.   

About Varonis

Varonis is a pioneer in data security and analytics, fighting a different battle than conventional cybersecurity companies. Varonis focuses on protecting enterprise data: sensitive files and emails; confidential customer, patient and employee data; financial records; strategic and product plans; and other intellectual property.

The Varonis Data Security Platform detects insider threats and cyberattacks by analyzing data, account activity and user behavior; prevents and limits disaster by locking down sensitive and stale data; and efficiently sustains a secure state with automation. With a focus on data security, Varonis serves a variety of use cases, including governance, compliance, classification and threat analytics.

Varonis started operations in 2005 and, as of September 30, 2019, had approximately 6,900 customers worldwide, spanning leading firms in the financial services, public, healthcare, industrial, insurance, energy and utilities, consumer and retail, media and entertainment, technology and education sectors.