SATW: Improving Interpretability. Philosophy of Science Perspectives on Machine Learning

Improving Interpretability. Philosophy of Science Perspectives on Machine Learning

Lay summary

Bei der Gesichtserkennung, der Prüfung auf Kreditwürdigkeit und der Analyse wissenschaftlicher Daten wird heute erfolgreich maschinelles Lernen eingesetzt. Dabei wird ein Computer nicht mit Regeln programmiert, die er dann bloss ausführt. Vielmehr „lernt“ der Computer auf der Basis von Daten. Obwohl das maschinelle Lernen in der Praxis gute Resultate erzielt, wirft es auch Fragen auf. Ein wichtiges Problem ist die sogenannte Interpretierbarkeit: Wir verstehen nicht, wie bestimmte Arten des maschinellen Lernens funktionieren und wie ein Computer etwa Gesichter erkennt.

Im Projekt untersuchen wir Interpretierbarkeit aus einer philosophischen Perspektive. Dabei greifen wir auf wissenschaftsphilosophische Erkenntnisse über Erklärungen zurück. Damit wollen wir untersuchen, was es heisst, maschinelles Lernen zu verstehen. Wir gehen davon aus, dass es mehrere Arten von Erklärungen gibt. Unser Hauptziel ist es, damit einen systematischen Rahmen für unterschiedliche Erwartungen an Interpretierbarkeit zu definieren. Wir wollen diesen Rahmen benutzen, um neue Ansätze aus der Computerwissenschaft zur Interpretierbarkeit zu analysieren und zu bewerten.

Das Projekt soll klären, unter welchen Bedingungen eine Schlüsseltechnologie unserer Zeit erklärbar und breiter akzeptabel werden kann. Gleichzeitig geht es darum, neue Entwicklungen in den Wissenschaften zu verstehen und den Erklärungsbegriff zu schärfen.

Abstract

These days, machine learning (ML) is all the rage, in science and beyond. ML is used to pre-process job applications, to automatically recognize faces in images or videos, and to classify astronomical objects - to name just a few examples. However, the ubiquitous use of ML also raises questions and concerns. One of the main problems is interpretability: We lack a theoretical understanding of ML models and, in particular of so-called deep neural networks (DNNs). For example, why are DNNs so successful in application? What, exactly, is it that they learn? What is the scope and limit of their success? Since we do not have answers to these questions, we do not understand how DNNs achieve their tasks - they remain black boxes, as it were. This is particularly a problem for science, which is supposed to explain and to help us understand phenomena in the world. Science cannot achieve these goals if its tools are black boxes. The problem has also been acknowledged in the political sphere: The EU's “General Data Protection Regulation” postulates a “right to explanation” for automated decision-making. But what does this right amount to? And how can it be granted to citizens?In the present project, we will address the problem of interpretability from a philosophical perspective. Our main working hypothesis is that philosophical work on explanation and understanding can help us to make sense of interpretability and to assess approaches that promise a better understanding of ML models. We will thus draw on insights from philosophical research and transfer them to recent discussions about interpretability of ML. One key philosophical finding is that explanation comes in many varieties and defies a straightforward analysis in terms of necessary and sufficient conditions. This suggests that interpretability is not one thing, but rather comes in many different flavors. Our main aim thus is to establish a conceptual framework for thinking about interpretability. The framework will help researchers to avoid confusion, to become clear about the expectations that loom behind the right to explanation and to classify and evaluate existing research programs that propose a better understanding of ML. The conceptual framework will be established by combining a top-down approach with a bottom-up strategy. As far as the top-down direction is concerned, we will draw on distinctions from the philosophy of explanation. As far as the bottom-up direction is concerned, our work will be informed by recent work on ML from computer science. In several case studies, e.g. about Statistical Learning Theory, the so-called Information Bottleneck method and work on causal descriptions, we will try to fit these approaches into our framework. This will not only help us to test our framework and to improve it, but also lead to a better understanding of the approaches and their prospects. The framework will in particular distinguish between explanatory vs. non-explanatory questions about ML, between different levels of generality, and between the subjective vs. objective components of understanding. In a last step, we'll bring this work to bear on the ethics of algorithms. Of course, as philosophers, we cannot come up with new explanations of how ML works. But we can get clear about what people are after when they urge for interpretable ML, we can better understand how current approaches to interpretable software hang together, and we can trace the consequences that ML models have for our philosophical view of scientific research in the 21st century. Our aim is to do just this.

Last updated:17.07.2023

SNSF
Project funding (Div. I-III)
Original data source 197504 i

Philosophy
Humanities and Social Sciences;Linguistics and literature, philosophy

1 People

Prof.Claus Beisbart

We help you find the perfect fit.

Lay summary

Abstract