Data protection in the development and use of AI systems
These pages have information on the requirements arising from data protection legislation that should be taken into account when artificial intelligence (AI) systems are developed and used. These guidelines are not exhaustive and organisations developing or deploying AI systems must always assess the requirements laid down in legislation case by case. In addition, AI systems are subject to the EU AI Act, for example.
On this page you will find information on the following topics:
- What is an AI system?
- How must data protection legislation be considered in AI systems?
- Assess risks and data protection impact
- Ensure the lawfulness of processing: when can personal data be used?
- Prohibited practices and high-risk AI systems
- Take data protection principles into account
- What must be taken into account in automated decision making?
- Process personal data securely
- Respect peoples’ data protection rights
- Demonstrate compliance with data protection legislation
What is an AI system?
In these guidelines, ‘AI system’ means a system that is meant and designed to analyse data, recognise patterns and use the data to produce decisions, content or predictions.
Data protection legislation does not include a definition for AI or AI systems. The EU AI Act defines ‘AI system’ as ‘a machine-based system that is designed to operate with varying levels of autonomy and that may exhibit adaptiveness after deployment, and that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments’.
The European Commission has published guidance on the definition of AI systems (on the Commission's website).
The lifecycle of an AI system has two stages: the development stage and the operation stage. The development stage covers all activities before the deployment of the system, such as the development of the algorithm, collecting and processing the training data, and training the system. The operation stage begins when the AI system is deployed and introduced to the use it was designed for.
Examples of common AI systems we encounter in our daily lives
- The recommendation systems of streaming services that analyse the listening or watch histories of users, recognise patterns in users’ interests and use this information to recommend new content to the users.
- The recommendation systems of online shops that analyse the purchase history of users, recognise patterns in users’ interests and use this information to recommend new products to users.
- Email spam filters that analyse email messages and recognise features based on which they filter out spam messages from other messages.
- Map services and navigation tools that analyse traffic data and suggest routes based on it.
- Search engines that analyse the behaviour of users and recommend search results based on the analysis.
- Chatbots used for customer service that analyse the information entered to them and generate responses based on the analysis.
- Personal AI assistants that help users create content, comprise information from several sources, and use the information to create schedules and to-do lists.
How must data protection legislation be considered in AI systems?
Data protection legislation must be complied with regardless of the type of the technology, meaning in AI systems as well. Data protection legislation must always be complied with if personal data is processed automatically.
Large quantities of personal data is often used in the development and use of AI systems. A legal basis must exist for the processing, and data protection principles such as data minimisation and purpose limitation must be taken into account.
If no personal data is processed in the development or use of an AI system, data protection legislation is not applied. In order for an organisation to be sure whether they process personal data in connection with an AI system or not, the organisation must carefully familiarise itself with the definitions of personal data and personal data processing. The organisation must also carefully familiarise itself with the AI system to be deployed and its operation.
Read more:
- What is personal data?
- Pseudonymised and anonymised data
- Opinion 28/2024 of the European Data Protection Board on certain data protection aspects related to the processing of personal data in the context of AI models (link directs to the website of the Board)
- Opinion 05/2014 of the Article 29 Data Protection Working Party (link opens a PDF file from the ec.europa.eu website)
Assess risks and data protection impact
An organisation developing or deploying an AI system must always assess the risks of personal data processing before it starts processing personal data. This ensures that the organisation can, already at the planning stage, determine the measures it must take to control the risks and ensure that the processing is lawful.
Organisations must always comply with the data protection principles in their operations and ensure that the principles are implemented according to the risk level of the planned processing. Risks must be assessed specifically from the perspective of the people whose data is processed, the data subjects, and the risks the personal data processing could expose the data subjects to must be identified. A risk assessment can also be a useful tool for assessing the risks the organisation is exposed to.
One tool for risk assessment is the data protection impact assessment (DPIA). According to the General Data Protection Regulation (GDPR), an impact assessment must be carried out especially when new technology will be used for personal data processing.
Read more about impact assessments
Carrying out an impact assessment is mandatory in certain situations. One must always be carried out when the personal data processing could cause a high risk to the data subjects’ rights and freedoms. If at least two of the criteria are met, the risk is considered high. The development of AI systems often meets the criteria for high risk.
The criteria for high risk
- Evaluation or scoring of natural persons
- Automated decision making that could have legal effects for natural persons
- Systematic supervision of people
- Processing of special categories of personal data or other very private data
- Extensive processing of personal data
- Combining datasets
- Processing personal data belonging to vulnerable persons such as children
- Using new technology or organisational solutions or innovative use
An impact assessment is also mandatory when the planned processing activities are included in the Data Protection Ombudsman’s list of processing operations which require data protection impact assessment. An organisation planning to develop or deploy an AI system should familiarise itself with the list and the criteria for high risk when it assesses whether a DPIA is required.
Even if carrying out a DPIA is not mandatory, the organisation can benefit from the process whenever it plans activities that include personal data processing. A DPIA supports compliance with the requirements of data protection legislation.
Ensure the lawfulness of processing: when can personal data be used?
A legal basis or a processing basis is always required for personal data processing. A processing basis is required both for the development of AI systems and their use if they involve personal data processing. The basis must exist already at the stage where the personal data is collected and used for the development and training of the AI system.
The processing of different types of personal data and the stages of the development and deployment of an AI system should be separated. This enables choosing processing bases and risk management measures that best fit each stage separately if necessary.
According to Article 6 of the GDPR, the bases for processing personal data are the data subject’s consent, contract, the controller’s legal obligation, protection of vital interests, a task carried out in the public interest or the exercise of official authority, and the legitimate interest of the controller or a third party.
Read more about processing bases
Consent can be used as the basis of personal data processing both in the development and use of an AI system. An individual can give consent to processing their personal data for one or more purposes. Such consent must be a freely given, specific, informed and unambiguous indication of the data subject’s agreement to the processing of personal data relating to them.
The controller, meaning the organisation developing or using an AI system, must be able to prove that consent has been given as required by law. It must also be possible to withdraw consent for free at any time and as easily as it was given. After a data subject withdraws consent, the processing of the data processed based on the consent must be immediately stopped and the data erased. The implementation of the withdrawal of consent must be efficient in AI systems as well. If it cannot be implemented, for example, technologically, consent cannot be used as the processing basis.
The Data Protection Ombudsman stresses that choosing consent as the processing basis at the development stage of an AI system can be burdensome in terms of administrative work required, especially if personal data of a large group of people is processed.
Read more about the requirements for using consent as a basis
Contracts can be used as the basis of personal data processing both in the development and use of an AI system. When a natural person is party to a contract, personal data of the person can be processed if it is necessary for the performance of the contract.
Legal obligation can be used as a processing basis both in the development and use of AI systems if the personal data processing is necessary for compliance with a legal obligation to which the controller, or the organisation developing or using an AI system, is subject.
Protection of vital interests can be used as a processing basis in the development and use of AI system only if the processing is required in order to protect the vital interests of the data subjects or another natural person. For example, personal data processing may protect a vital interest in situations requiring humanitarian aid such as natural disasters or epidemics.
In order to meet this requirement, the danger to peoples’ lives must be sufficiently concrete, for example. This processing basis is therefore very rarely applicable.
Personal data can be processed in the development and use of AI systems if the processing is required for the public interest or the exercise of official authority vested in the controller. The public interest task or official authority must be vested by law or other legal provision. ‘Controller’ means the organisation developing or using an AI system.
Personal data can be processed in the development and use of AI systems if the processing is required for the purposes of a legitimate interest pursued by the controller or third party. ‘Controller’ means the organisation developing or using an AI system.
For example, a legitimate interest may exist when there is a meaningful relationship between the data subject and the controller. The relationship between an organisation and its customer is one example of such a relationship. However, personal data may not be processed if the interests and rights or the data subject override the legitimate interest of the organisation.
The European Data Protection Board has published guidelines on matters that must be taken into account when determining whether a legitimate interest exists.
The three stages of the determination are:
1. Identification and description of the legitimate interest. In order to use legitimate interest as a processing basis, all of the following must be met:
- The legitimate interest pursued is legal.
- The legitimate interest is clearly communicated and justified.
- The legitimate interest is real and exists (not based on an expectation).
2. Assessing the necessity of the planned personal data processing
- Do the planned processing activities promote achieving the legitimate interest?
- Are there alternative implementation methods that would require less personal data processing?
3. Determining whether the legitimate interest is overridden by the rights and freedoms of natural persons (‘balancing test’). The determination must be made on a case-by-case basis and the following must be taken into account in particular:
-
The risks to which the development and use of the AI system could expose the rights and freedoms of natural persons.
-
The effects that the personal data processing in the development and use of an AI system could have on the natural persons. There can be different types of effects and they can be positive or negative.
-
The nature of the personal data to be processed, the context of the processing activities, and the possible consequences of the processing to the natural person.
An organisation that plans to use legitimate interest as the basis for processing personal data in the development or use of an AI system should read the opinion of the EDPB (link directs to the EDPB website): Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models
If the development, training or use of an AI system involves processing data belonging to special categories of personal data, such as health data, a special processing basis must also exist. The special processing bases are laid down in Article 9 of the GDPR. These include explicit consent of the data subject or reasons laid down in law.
Read more about special categories of personal data and special processing bases
An organisation developing or using an AI system must choose which personal data processing basis or bases it will use. Among other matters, the basis affects what rights the data subjects have. AI systems must be designed and implemented in such a way that they allow data subjects to exercise their data protection rights. For example, if the personal data processing is based on legitimate interest, the organisation must be able to implement the data subjects’ right to object to the processing of their personal data.
Read more about data subjects’ rights in different situations
In order for personal data processing to be lawful, other legislation must also be complied with.
Prohibited practices and high-risk AI systems
Using AI systems is not always allowed. Certain AI system use cases are entirely prohibited under Article 5 of the EU AI Act.
The AI Act also determines what types of AI systems are considered high-risk systems. In particular, high-risk systems are those that have a significant harmful impact on the safety and fundamental rights of persons. High-risk AI systems can only be placed on the market or used if they meet the requirements laid down in the AI Act.
- Harmful manipulation and deception. AI systems that use techniques that a person cannot consciously detect or that are otherwise intentionally manipulative or deceptive with the objective or effect of distorting the behaviour of a person or a group. As an effect of the distortion, the person’s ability to make decisions is appreciably impaired, which leads to them making decisions that they would not have otherwise made or that are likely to cause significant harm to them or another person.
- Haavoittuvuuksien haitallinen hyödyntäminen. Tekoälyjärjestelmät, joissa hyödynnetään iästä, vammaisuudesta tai erityisestä sosiaalisesta tai taloudellisesta tilanteesta johtuvia haavoittuvuuksia ja joiden tavoitteena tai seurauksena on vääristää heidän käyttäytymistään niin, että se kohtuullisen todennäköisesti aiheuttaa merkittävää haittaa.
- Exploiting vulnerabilities. AI systems that exploit vulnerabilities resulting from age, disability or a specific social or economic situation, with the objective, or the effect, of materially distorting the behaviour of such persons in a manner that is reasonably likely to cause significant harm.
- Social scoring. AI systems used for the evaluation or classification of persons or groups based on their social behaviour or personal characteristics, with the social score leading to detrimental or unfavourable treatment. The treatment is unjustified or disproportionate in respect of the evaluated behaviour, or the treatment manifests in a context other than the context in which the original data on the behaviour or characteristics was collected.
- Assessing and predicting whether a person will commit a crime. AI system for assessing or predicting the risk of a natural person committing a criminal offence based solely on profiling or assessing the personality traits and characteristics of the natural person. However, it is not forbidden to support human assessment of the same which is already based on objective and verifiable facts directly linked to a criminal activity.
- Untargeted scraping for the creation of facial recognition databases. AI systems that create or expand facial recognition databases through the untargeted scraping of facial images from the internet or CCTV footage.
- Inferring emotions. AI systems that infer emotions of a person in the areas of workplace and education institutions. This prohibition does not apply to medical or safety purposes.
- Biometric categorisation. AI systems that categorise natural persons based on their biometric data to deduce their race, political opinions, trade union membership, religious or philosophical beliefs, sex life or sexual orientation. This prohibition does not apply to any labelling or filtering of lawfully acquired biometric datasets, such as images, for the purposes of law enforcement.
- Real-time biometric identification. Using ‘real-time’ remote biometric identification AI systems in publicly accessible spaces for the purposes of law enforcement. This prohibition does not apply to situations where such identification is absolutely necessary, such as targeted search for specific victims, preventing specific threats (e.g. terrorist attacks), or localisation of persons suspected of specific crimes in compliance with the requirements laid down in the law.
Take data protection principles into account
Personal data must be processed transparently and in a manner appropriate for the purpose for which the data was collected. Data subjects must be informed transparently and clearly about the personal data processing and the information provided must not be misleading. Information must also not be processed in a manner that is unpredictable or unexpected from the perspective of the data subject.
Read more about data protection principles
Communicate about personal data processing transparently
Personal data must be processed transparently and in a manner appropriate for the purpose for which the data was collected. Data subjects must be informed transparently and clearly about the personal data processing and the information provided must not be misleading. Information must also not be processed in a manner that is unpredictable or unexpected from the perspective of the data subject.
When processing personal data in the development or use of an AI system, the data subjects must be informed of the following at the least:
- What data is collected
- The purpose for which the data is processed
- How long the data collected will be retained
- Whether the data is disclosed or sold to third parties
- How the data is processed
- What data protection rights the data subject has and how they can be exercised
The AI Act requires that organisations communicate transparently on the system to its users. For example, users must be informed when they interact with an AI system. In addition, organisations are obliged to provide a description of the mechanism behind any automated decision making.
When is there no obligation to inform?
There are certain extraordinary situations where the obligation to inform does not apply. These special bases must be interpreted in conservatively. Providing sufficiently extensive information to data subjects is a requirement for ensuring that the data subjects can be certain that their data is processed appropriately.
If, for the development of an AI system, personal data will be collected from public sources or by some other manner in which the data is not directly received from the data subjects themselves, the obligation to inform may not apply if the data collection or disclosure is regulated by law or if the data cannot be disclosed because of a statutory non-disclosure obligation. Information that a data subject has already received does not need to be delivered to them again.
The obligation to inform also does not apply under certain conditions when personal data is processed for archiving purposes that are in the public interest, purposes of scientific or historical research, or statistical purposes if the delivering the data is impossible or it would require disproportionate effort. In this case too, the controller must make the information publicly available.
Determine the purpose of the personal data and use the data in line with the purpose
Before personal data processing is started, the purpose for which the data is used must be clearly planned and determined. The personal data must be collected for this specific purpose. The purpose must be legal.
The organisation developing or using an AI system is responsible for determining the purpose of the personal data. Several organisations may contribute to the development or use of the AI system at different stages. This is a case of joint controllership, meaning that the organisations are jointly responsible for determining the purpose and the personal data processing.
The European Data Protection Board has stated that data protection authorities must require a detailed description of the planned purpose of the personal data processing to be carried out in an AI system.
The AI Act requires that the purposes of AI systems are defined and carefully documented.
Different types of personal data may be needed for different purposes at the different stages of AI system development and use. In some cases, the same personal data is used for the same purposes both in the development and use of the system.
Personal data may not be used later for purposes that are not in line with the original purpose. Processing the data for other purposes can be allowed if the new purpose is compatible with the original purpose.
If an organisation plans to use personal data that it has collected for other purposes in the development or use of an AI system, it must assess the new purpose based on the following criteria:
- What is the connection between the original purpose and the new purpose?
- In what context was the personal data collected?
- What is the data subjects’ relationship to the party that is responsible for processing their personal data?
- What kind of processing can data subjects reasonably expect?
- What types of personal data will be processed?
- Will sensitive personal data be processed?
- What consequences could the personal data processing have for the data subjects?
- What safeguards does the organisation have in place?
If the new purpose is in line with the original purpose, the personal data can be processed based on the original processing basis. If the new purpose is not in line with the original purpose, a new processing basis must be determined.
Generally, a new purpose is not compatible if it is materially different, if the new personal data processing would be unexpected for the data subjects, or if the processing would result in unjust consequences for the data subjects. Archiving for public interest, scientific or historical research, and the creation of statistics are usually compatible with original purposes, as long as sufficient safeguards are in place.
Read more about the principle of purpose limitation
Minimise the data processed and ensure its accuracy
According to the GDPR, the personal data processed must be adequate, relevant and limited to what is necessary in relation to the purposes for which the data is processed. This means that personal data may only be collected and processed to the extent that is necessary.
In the development and use of AI systems too, only the personal data that is required for the pre-determined purposes may be processed. An organisation developing or using an AI system must always carefully assess what personal data is required.
The development of an AI system often requires extensive high-quality datasets to ensure the model works correctly statistically and does not discriminate against some groups of people. It may therefore be necessary to process personal data in order to avoid any bias and errors. This need must also be identified when the purpose of the data is determined.
If the volume of data required at each stage is difficult to assess, start with limited datasets and gradually increase the volume according to justified needs. The relevancy of the data must also be monitored and re-assessed at all stages. In the assessment, it must be considered whether the same objective could be achieved with synthetic or anonymous data, for example.
The personal data must be accurate and updated when necessary. It is especially vital to ensure accuracy if the AI system uses personal data to make decisions or conclusions related to the data subjects.
Read more about data minimisation
Read more about data accuracy
Storage limitation and determining the storage period
According to the GDPR, the storage period of personal data must be limited. Personal data may be stored in a format that allows identifying the data subjects only as long as is necessary for the purpose of the personal data.
A time limit must be determined for the data after which the data is erased, archived or anonymised. When the personal data is no longer needed for the development or use of an AI system, the data must be anonymised or erased.
Personal data may be stored for longer than originally planned only if the safeguards required in the GDPR are appropriately in place and the data is processed for one of the following purposes:
- Archiving in the public interest
- Scientific or historical research
- Statistical purposes
Read more about storage limitation
What must be taken into account in automated decision making?
AI systems can be used for automated decision making. Decision making is automated if it is based solely on automated personal data processing and when the decisions have legal or similarly significant effects on people.
People have the right to not be subjected to such decision making. However, there are exceptions to this rule. Automated decision making is allowed if appropriate safeguards are in place and if the decision is
- necessary for the conclusion or performance of a contract between the data subject and the controller;
- approved in legislation to which the controller is subject;
- based on explicit consent given by the data subject.
Sensitive data belonging in special groups of personal data, such as health data, can be processed in connection with automated decision making only if the data subject consented to the processing or if it is necessary based on a statutory public interest.
The result of profiling can be data that belongs to special groups of personal data even if the original data alone was not sensitive. A processing basis must always exist for the processing of such data as well.
In addition, the data subjects must be informed of the logic of the processing and the consequences the processing has for them. Data subjects must be clearly, transparently and plainly informed what practices and principles are used in the processing of their personal data. The organisation must also be able to demonstrate to what extent the AI system made the decision relating to them.
Read more about automated decision making and profiling
Process personal data securely
Both the GDPR and the AI Act highlight the importance of protecting personal data throughout the data’s lifecycle against processing violating personal data legislation. Organisations must implement the appropriate technical and organisational measures to ensure and demonstrate that the personal data processing is in line with personal data legislation.
Organisations must have the ability to ensure the ongoing confidentiality, integrity, availability and resilience of processing systems and services, and the ability to restore availability and access in the event these are prevented. In addition, organisations must have a process for regularly testing, assessing and evaluating the effectiveness of the measures in order to ensure the security of the processing.
Such measures may include the following:
- Encryption: Encrypting the personal data during both storage and transfers ensures confidentiality even in the event of a data breach.
- Access right management: Restricted access rights limit who can use and edit personal data.
- Vulnerability testing: Regular vulnerability testing helps identify and remedy any vulnerabilities in the system’s security.
- Log data and auditing: Maintaining detailed log data on the system’s operation enables detecting and investigating any suspicious activity.
AI systems are associated with special risks that require safeguards that supplement the traditional data protection practices.
Such supplementary safeguards include:
- Input management: Text, video, sound or images can be used as input for an AI system. Before entering input data for processing by the AI system, it must be examined whether the input data includes any deviating or harmful content and, if necessary, these must be erased to ensure that they do not conflict with the purpose of the AI system. Other safeguards include controlling and limiting the amount of input data.
- Decision control: The accuracy, justness and traceability of the outputs of the AI system must be ensured and any biases must be identified.
Respect peoples’ data protection rights
Data subjects’ data protection rights are laid down in the GDPR. A data subject is a natural person whose personal data is processed in connection with the development or use of an AI system. Data subjects’ rights are the right to access the data collected on them, the right to have the data rectified and erased, the right to restrict the processing and object to it, and the right to have the data transmitted from one system to another.
Organisations developing or using AI system must ensure that the system and its mechanisms are such that all of the rights related to the personal data processing in the system can be efficiently implemented.
Read more about data subjects’ rights
Demonstrate compliance with data protection legislation
Organisations developing or using an AI system are responsible for complying with the requirements of data protection legislation and for being able to demonstrate their compliance (‘accountability’). Accountability requires implementing certain measures and documenting them.
Read more about accountability and the measures and documentation required
Read more:
Artificial Intelligence Act (EUR-Lex)
General Data Protection Regulation (EUR-Lex)
Information on the AI Act on the Commissions website