Close this search box.

How to handle GDPR in unstructured data

Being able to quickly scan a large number of files of various file types and find PII.
Being able to determine who, or what part of the organization, the file containing sensitive data belongs to.
Being able to distribute information about the sensitive data, in an easy and straight-forward way, to the Data Controller, Data Processor and the DPO making sure policies are followed.

GDPR applies to all types of systems where personal data is stored. Most GDPR projects start with, and often doesn’t get further than, structured data, i.e. databases, document management system, CRM systems. These systems are generally searchable and taggable by default and once an organization has decided what constitutes PII (Personally Identifiable Information), implementing policies is relatively straight-forward.

Unstructured data (especially file data) is a different beast all together. You have a slew of different file types and formats, and the data is generally spread over different platforms and locations. The files aren’t necessarily easily searchable, and even if they are the content will look different from file to file and patterns are more difficult to establish. On top of that, most organizations lack a good policy for data ownership where files are owned by system or application accounts and no hierarchical structure exists so that information about these files can be escalated.

To tackle GDPR compliance in unstructured data, you need to:

  1. Define rules about what constitutes sensitive data and PII in your organization, e.g. social security numbers (SSN), address information, contracts etc.
  2. Understand where the organization’s unstructured data resides, on-premise file shares, SharePoint, OneDrive etc.
  3. Establish and maintain a system of data ownership to ensure information will be handled by the appropriate person.
  4. Scan this data regularly to discover files that likely contain sensitive data.
  5. Establish policies on how sensitive data should be handled, what should be kept in what system etc.
  6. Tag the identified data and deliver this information out in the organization so that the data owners themselves can take action on the sensitive data, move, delete etc.
  7. Aggregate and spread information about sensitive data to Data Stewards, department heads and executives to safeguard that the rules are followed.
  8. Create a workflow to be able to inform the individual of what personal data about them is being held and based on that have the possibility to accurately delete that data.

Northern can assist in GPDR compliance related to unstructured data by:

  1. Giving an overview of the environment; what data repositories contain user-generated unstructured data, who is saving what files and where they are saved.
  2. Finding likely file and share owners, for example based on recent activities.
  3. Gathering meta data on files to pinpoint potential areas of interest.
  4. Scanning the text in files and by using specific words or regular expressions find potentially sensitive information.
  5. Creating actionable dashboards that can be distributed to data owners as well as data stewards.
  6. Assisting in establishing policies around GDPR and sensitive data based on best practices.

While less straight-forward, including unstructured data in a GDPR project is imperative to living up to regulatory demands. Getting a better handle on the unstructured data will also allow for ROT (redundant, obsolete, trivial) clean-up, increasing cost efficiency and decreasing risk far beyond what is dictated by the regulation.

Scroll to Top