Peer-reviewed research papers
In lifelong learning, a model learns different tasks sequentially throughout its lifetime. State-of-the-art deep learning models, however, struggle to generalize in this setting and suffer from catastrophic forgetting of old tasks when learning new ones. While a number of approaches have been developed in an attempt to ameliorate this problem, there are no established, unified or generalized frameworks for rigorous evaluations of proposed solutions; a problem which is particularly pronounced in the domain of NLP. The few existing benchmarks are typically limited to a specific flavor of lifelong learning – continual open-set classification – where new classes, as opposed to tasks, are learned incrementally. Moreover, the only general lifelong learning benchmark combines a multi-label classification setup with a multi-class classification setup resulting in misleading gradients during training. We empirically demonstrate that the catastrophic forgetting observed here can be attributed to the experimental design rather than to any inherent modeling limitations. To address these issues, we propose an experimental framework for true, general lifelong learning in NLP. Using this framework, we develop a comprehensive suite of benchmarks that target different properties of lifelong learning (e.g., forgetting or intransigence); experiment with diverse facets of language learning: multi-domain, multilingual and different levels of linguistic hierarchy; and present a continuous evaluation scheme under a new metric: Area Under the Lifelong Test Curve. Our framework reveals shortcomings of prevalent memory-based solutions, demonstrating they are unable to outperform a simple experience replay baseline under the realistic lifelong learning setup.
We perform text normalization, i.e. the transformation of words from the written to the spoken form, using a memory augmented neural network. With the addition of dynamic memory access and storage mechanism, we present a neural architecture that will serve as a language-agnostic text normalization system while avoiding the kind of unacceptable errors made by the LSTM based recurrent neural networks. By reducing the number of unacceptable mistakes, we show that such a novel architecture is indeed a better alternative. Our proposed system requires significantly lesser amounts of data, training time and compute resources. Although a few occurrences of these errors still remain in certain semiotic classes, we demonstrate that memory augmented networks with meta-learning capabilities can open many doors to a superior text normalization system.
In this paper, we investigate the application of attention-based hierarchical LSTM architecture variants to the problem of question classification. Question classification is often the first and important step for any Question-Answering system. We show that although the hierarchical design greatly improves performance over vanilla LSTMs, adding an attention mechanism only results in slight improvement. Then, we change perspective to probabilistically model the question dataset using discrete latent variables in order to see if the given coarse-level categories are re-discovered. While some latent structure is learned, it isn’t the one we expected. We consider the possible reasons and suggest future improvements.
Domain shift occurs when the test distribution is different from the training distribution on the same task, usually degrading predictive performance. We study meta-learning approaches to few-shot domain adaptation for the sentiment classification task. We use two representative meta-learning methods: Prototypical networks and MAML, and a multitask learning baseline. We find that the multitask baseline proves to be quite strong, outperforming both meta-learning methods. However, MAML achieves performance close to multitask learning when domain shift is high. We also find that smart support set selection increases model performance slightly for all models studied.
Cross-lingual fine-tuning has been widely used to bridge the gap between high-resource & low-resource languages. In this paper, we study the evolution of the learned representations during cross-lingual fine-tuning. We fine-tune a pre-trained multi-lingual BERT on a small Dutch corpus. A BERT model, pre-trained on Dutch exclusively, is used as a comparative baseline. We show that our transferred multi-lingual BERT learns a different representation subspace than the native model. Additionally, we explore the loss in multi-lingual capacity during fine-tuning.
Using probing methods designed for language models, we compare syntactic representations learned by recurrent and attention-based deep learning models. Surprisingly, we find that recurrent models capture POS tag and syntax tree information to a higher degree than attention-based models do. However, these conclusions do not hold as strongly for syntax tree information, as there are no control tasks for structural probes yet, and other NLP tasks may yield different results. Interesting directions for further work are therefore to explore probing for other syntactic and semantic tasks, and to design control tasks for these other tasks.
For many applications, understanding why a predictive model makes a certain prediction can be of crucial importance. In the paper ‘‘Towards Robust Interpretability with Self-Explaining Neural Networks’', Melis et al. propose a model that takes interpretability into account by design. We study the reproducibilty and validity of the proposed framework. Several weaknesses of the approach are identified. Most notably, we find that the model rarely generates good explanations, and that performance is compromised more than reported by the authors when enforcing explanations to be stable. We put forward improvements to the framework that address these weaknesses in a principled way, and show that they enhance the interpretability of generated explanations.
In this paper, we evaluate techniques for creating sentence representations on the sentiment classification task. We show that: word order is important; Tree-LSTMs outperform their recurrent counterparts, which agrees with findings of Tai et al. (2015); sentiment classification is harder when sentence length increases; and that supervising sentiment at the node level decreases overfitting, but does not lead to performance improvements. We present a method for framing the sentiment classification task as a regression problem, which has, to the best of our knowledge, not been done for this specific task before. Although this does not lead to performance improvements, it allows for a different and useful perspective of the problem. Interesting further work would be to analyze why the regression case does not work as well as expected, and to explore the benefits of both perspectives.
We propose a novel neural network architecture called Hierarchical Latent Autoencoder to exploit the underlying hierarchical nature of the CMS Trigger System for data quality monitoring. Given the hierarchical cascaded design of the CMS Trigger System, the central idea is to learn the probability distribution of the Level 1 Triggers, modelled as the hidden archetypes, from the observable High Level Triggers. During evaluation, the learned parameters of the latent distribution can be used to generate a reconstruction probability score. We propose to use this probability metric for anomaly detection since a bounded number from zero to one has better interpretability in quantifying the severity of a fault. We selected a particular Level 1 Trigger and its corresponding High Level Triggers for our experiments. The results demonstrate that our architecture does reduce the reconstruction error on the test set from $9.35 \times 10^{-6}$ when using a vanilla Variational Autoencoder to $4.52 \times 10^{-6}$ when using our Hierarchical Latent Autoencoder. Hence, we successfully show that our custom designed architecture improves the reconstruction capability of variational autoencoders by utilizing the already existing hierarchical nature of the CMS Trigger System.
into building things, taking risks and aesthetics.
MSc. in Artificial Intelligence, 2021
Universiteit van Amsterdam
B.Tech in CSc. & Eng., 2019
Vellore Institute of Technology