The European Data Protection Board established the ChatGPT Taskforce a year ago to establish whether OpenAI’s handling of personal data was in compliance with GDPR laws. A report outlining preliminary findings has now been released.
The EU is extremely strict about how its citizens’ personal data is used, with GDPR rules explicitly defining what companies can and can’t do with this data.
Do AI companies like OpenAI comply with these laws when they use data in training and operating their models? A year after the ChatGPT Taskforce started its work, the short answer is: maybe, maybe not.
The report says that it was publishing preliminary findings and that “it is not yet possible to provide a full description of the results.”
The three main areas the taskforce investigated were lawfulness, fairness, and accuracy.
Lawfulness
To create its models, OpenAI collected public data, filtered it, used it to train its models, and continues to train its models with user prompts. Is this legal in Europe?
OpenAI’s web scraping inevitably scoops up personal data. GDPR says you can only use this info where there is a legitimate interest and take into account the reasonable expectations people have of how their data is used.
OpenAI says its models comply with Article 6(1)(f) GDPR which says in part that the use of personal data is legal when “processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party.”
The report says that “measures should be in place to delete or anonymise personal data that has been collected via web scraping before the training stage.”
OpenAI says it has personal data safeguards in place but the taskforce says “the burden of proof for demonstrating the effectiveness of such measures lies with OpenAI.”
Fairness
When EU citizens interact with companies they have an expectation that their personal data is properly handled.
Is it fair that ChatGPT has a clause in the Terms and Conditions that says users are responsible for their chat inputs? GDPR says an organization can’t transfer GDPR compliance responsibility to the user.
The report says that if “ChatGPT is made available to the public, it should be assumed that individuals will sooner or later input personal data. If those inputs then become part of the data model and, for example, are shared with anyone asking a specific question, OpenAI remains responsible for complying with the GDPR and should not argue that the input of certain personal data was prohibited in the first place.”
The report concludes that OpenAI needs to be transparent in explicitly telling users that their prompt inputs may be used for training purposes.
Accuracy
AI models hallucinate and ChatGPT is no exception. When it doesn’t know the answer, it sometimes just makes something up. When it delivers incorrect facts about individuals, ChatGPT falls foul of GDPR’s requirement for personal data accuracy.
The report notes that “the outputs provided by ChatGPT are likely to be taken as factually accurate by end users, including information relating to individuals, regardless of their actual accuracy.”
Even though ChatGPT warns users that it sometimes makes mistakes, the taskforce says this is “not sufficient to comply with the data accuracy principle.”
OpenAI is facing a lawsuit because ChatGPT keeps getting a notable public figure’s birthdate wrong.
The company stated in its defense that the problem can’t be fixed and people should ask for all references to them to be erased from the model instead.
Last September, OpenAI established an Irish legal entity in Dublin, which now falls under Ireland’s Data Protection Commission (DPC). This shields it from individual EU state GDPR challenges.
Will the ChatGPT Taskforce make legally binding findings in its next report? Could OpenAI comply, even if it wanted to?
In their current form, ChatGPT and other models may never be able to completely comply with privacy rules that were written before the advent of AI.