Seminar with Taner Kuru
Data Trawling to Train LLMs: A Lawful Catch?
Abstract
Since its launch in late 2022, OpenAI’s ChatGPT has become a focal point of global conversation, evidenced by its record-breaking achievement as the fastest-growing consumer application in history, reaching 100 million monthly active users just two months after its debut. Since then, large language models (LLMs), which are trained, among others, on publicly accessible online personal data, have dominated discussions within the EU data protection domain. Several investigations were initiated by data protection authorities concerning, among others, the processing activities involved in training these models. The establishment of the EDPB’s special task force on ChatGPT in early 2023 underscored the significance of these issues in the EU data protection framework. At the heart of these discussions lies the question of whether and to what extent developers of these models have a “legitimate interest” in training their models with publicly accessible online personal data to understand whether Article 6(1)(f) GDPR could be relied on as the legal basis for this processing activity. After providing an overview of these developments, this talk will further explore why this processing activity should be subjected to Article 9 GDPR regime given the relevant jurisprudence of the Court of Justice of the European Union (CJEU). It will then explain the potential implications of this interpretation and a possible way forward that may be taken to ensure the lawfulness of these processing activities in the EU data protection framework will be speculated with its opportunities and challenges for all stakeholders.
Registration
Please register no later than the 6 February 2025 at 09:00 CET using this registration form
Bio
Taner is a PhD researcher at Tilburg Institute for Law, Technology, and Society (TILT) of Tilburg University, where he researches the ethical and legal implications of investigative genetic genealogy. He is also interested in the implications of novel technologies, such as GenAI, on the EU data protection framework.
Before joining TILT, he completed the Advanced LL.M. in Law and Digital Technologies program at Leiden University with cum laude distinction as an awardee of the Jean Monnet Scholarship in 2020. He received the European Data Protection Law Review’s “Young Scholar Award” for his article “Genetic Data: The Achilles’ Heel of the GDPR?” based on his master’s thesis. He also worked as an intern at the United Nations Interregional Crime and Justice Institute (UNICRI) Centre for Artificial Intelligence and Robotics in 2020. Taner is also a certified lawyer at Ankara Bar Association and previously worked as an attorney-at-law in Turkey.
Article “Lawfulness of the mass processing of publicly accessible online data to train large language models”, International Data Privacy Law, 2024