Minimisation of personal data in scientific research

The necessity of personal data for scientific research must be assessed at the earliest possible stage. Efforts must be made to minimise the processing of personal data. Both the amount and nature of the personal data processed for the study need to be considered.

The GDPR (EUR-Lex) emphasises the need to minimise data, particularly when the personal data is being processed for purposes of scientific research. The personal data must be adequate, relevant and necessary for the purpose of the processing.

Studies should be carried out without using personal data whenever possible. If the data processed for the study is anonymous, such as aggregated statistics, it is not subject to data protection regulations. The goal of anonymisation is to render the data unidentifiable so that individual events cannot be distinguished from it. The prevention of identification must be permanent and make it impossible for the controller or a third party to convert the data back into identifiable form with the information held by them.

The processing of anonymous data is less restricted and safer also from the researcher’s perspective. Anonymous data facilitates international cooperation, as differences in the data protection regulations of different countries will not complicate the implementation of the study.

Personal data may only be processed for research if the study would be impossible to carry out with anonymous data. The processing of personal data must be limited to necessary data for the subject and purpose of the study. Data protection regulations apply to all information related to an identified or identifiable individual.

Example:
The names and addresses of research participants are needed for sending questionnaires and reminders. When the last reminder has been sent, the response period has elapsed and the research data has been gathered, the names and addresses can be erased.

In scientific research, the minimisation of data is often implemented by pseudonymising the data necessary for the study. Pseudonymisation can usually be carried out immediately after the data required for the study has been compiled. Pseudonymisation means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific person without using additional information. Encoding of personal data is an example of pseudonymisation. Decoding records and identifying each data subject is still a simple task for the holder of the code key. Personal data can also be protected with false names. For example, a data item related to the individual can be replaced with another in a database. Pseudonymised data is still personal data, and its processing is subject to data protection regulations.

Personal data can also be minimised by eliminating directly identifying information, such as names and personal identity codes, from the material. It is important to note that eliminating direct identifiers rarely renders the research data anonymous. Individuals can be identified by other data than their names. The collected material can contain such detailed information on individuals (e.g. rare diseases or a sufficient amount of different types of data) that they are indirectly identifiable from it. In addition to research variables, personal background variables (such as gender, marital status, year of birth, age and education) also increase the likelihood of identification. Persons can be easily identified just by their profession (e.g. president). Increasing the number of parallel data types (such as date of birth, country, year, time of holding the post) makes identification progressively easier. Anyone can be identified with enough descriptive data types. Studies often collect an extensive amount of data on individual research subjects, enabling identification in many cases even without direct identifiers.

Example:
A study collects data on an upper secondary school teacher from a small town, but uses a nickname in place of the individual’s real name. The nickname is widely used, so the other data concerning the teacher can be readily identified as applying to him when connected to the nickname. Even though the study does not collect any data with the teacher’s real name, the extensive identifiability of the town and nickname makes it possible to indirectly connect the data to the individual.

Throughout the duration of the study, and especially at its conclusion, the controller must ensure that personal data is not processed more extensively than necessary for the original purposes. The storage time of personal data must be minimised.

Example:
Research forms should be designed to permit the erasure of the research subjects’ identifying information immediately when it is no longer required for the aggregation of data. For example, personal data can be located on a part of the form that is easy to cut off.

Anonymisation and pseudonymisation should be performed as soon as possible, for instance right after the data have been aggregated. If processing of personal data is necessary, the controller must specify dates for reviewing the necessity of their storage.

More information on anonymisation techniques for personal data is available from the Article 29 Working Party’s (present European Data Protection Board) Opinion 5/2014 available on the Data Protection Board’s website.

Minimisation of personal data in scientific research

Common topics