Current challenges in the alignment techniques of foundation models
Are current AI safety measures enough? Join us to examine their effectiveness against future threats!
Description:
- Introducing a framework to examine the progression of AI development, focusing on the growth in agency and generality among AI models. This trend implies that future iterations may exhibit novel types of malfunctions not seen in present-day models.
- Overviewing existing technical safety measures for AI: how they mitigate current failure modes, and their potential to address future issues.
- Presenting safety as a characteristic of the socio-technical system in which technical development takes place, discussing defense in depth strategies, organizational safety culture, and the role of third-party auditors.
- Introducing BELLS: a practical assessment tool for evaluating the resilience of large language model supervision systems. This is our main technical project, that we presented at the ICML conference.
Speaker:
Charbel-Raphaël Ségerie is the executive director of the Centre pour la Sécurité de l’IA (CeSIA) where he leads research and training in advanced AI. He teaches a course on AI safety at the Ecole Normale Supérieure. His work focuses on comprehensively characterising emerging risks in AI, on interpretability, addressing challenges related to current safety methods like RLHF, and safe-by-design AI approaches. Previously, he researched at Inria Parietal and Neurospin.