Self-Explaining Neural Networks: A review with extensions

Aman Hussain, Omar Elbaghdadi, Ivan Bardarov, Christoph Hoenes

January 2020

Abstract

For many applications, understanding why a predictive model makes a certain prediction can be of crucial importance. In the paper ‘‘Towards Robust Interpretability with Self-Explaining Neural Networks’', Melis et al. propose a model that takes interpretability into account by design. We study the reproducibilty and validity of the proposed framework. Several weaknesses of the approach are identified. Most notably, we find that the model rarely generates good explanations, and that performance is compromised more than reported by the authors when enforcing explanations to be stable. We put forward improvements to the framework that address these weaknesses in a principled way, and show that they enhance the interpretability of generated explanations.

Type

Report

Publication

Fairness, Accountability, Confidentiality and Transparency in AI