# Recent Posts

### Catastrophic forgetting in Lifelong learning

An empirical analysis of the proposed solutions to the catastrophic forgetting problem. Experiments on using Experience Replay, Adapters, Meta-learning (MAML and OML) to train BERT for lifelong text classification.

### Divergence in Deep Q-Learning: Tips and Tricks

Two tricks are used to train DQNs - Target Network which mitigates divergence and Experience Replay which facilitates convergence towards better policies.

### Verbs as Nouns, Functions as Data

Grokking higher-order functions in Lisp is like breaking down the wall between nouns (data) and verbs (functions)

### Workflow for Deep Learning Projects

Guidelines and best practices for developing deep learning systems

### Reimplementing the PyTorch training loop in simple Python

The PyTorch training loop relies on the abstractions over data (Dataset, Dataloader) and abstractions over model updates (Parameters, Optimizers). Here we re-implement these abstractions in plain Python to understand them better.

### Building the foundations of Deep Learning from scratch

We implement the foundations of deep learning systems: optimized matrix multiplications for the forward pass and reverse mode auto-differentiation for the backward pass.

# Research Reports

### Question Classification using Hierarchical LSTM architectures

In this paper, we investigate the application of attention-based hierarchical LSTM architecture variants to the problem of question classification. Question classification is often the first and important step for any Question-Answering system. We show that although the hierarchical design greatly improves performance over vanilla LSTMs, adding an attention mechanism only results in slight improvement. Then, we change perspective to probabilistically model the question dataset using discrete latent variables in order to see if the given coarse-level categories are re-discovered. While some latent structure is learned, it isn’t the one we expected. We consider the possible reasons and suggest future improvements.

### Meta-Learning for Few-shot Domain Adaptation

Domain shift occurs when the test distribution is different from the training distribution on the same task, usually degrading predictive performance. We study meta-learning approaches to few-shot domain adaptation for the sentiment classification task. We use two representative meta-learning methods: Prototypical networks and MAML, and a multitask learning baseline. We find that the multitask baseline proves to be quite strong, outperforming both meta-learning methods. However, MAML achieves performance close to multitask learning when domain shift is high. We also find that smart support set selection increases model performance slightly for all models studied.

### Evolution of Representations in Cross-Lingual Fine-tuning

Cross-lingual fine-tuning has been widely used to bridge the gap between high-resource & low-resource languages. In this paper, we study the evolution of the learned representations during cross-lingual fine-tuning. We fine-tune a pre-trained multi-lingual BERT on a small Dutch corpus. A BERT model, pre-trained on Dutch exclusively, is used as a comparative baseline. We show that our transferred multi-lingual BERT learns a different representation subspace than the native model. Additionally, we explore the loss in multi-lingual capacity during fine-tuning.

### Probing Language Models

Using probing methods designed for language models, we compare syntactic representations learned by recurrent and attention-based deep learning models. Surprisingly, we find that recurrent models capture POS tag and syntax tree information to a higher degree than attention-based models do. However, these conclusions do not hold as strongly for syntax tree information, as there are no control tasks for structural probes yet, and other NLP tasks may yield different results. Interesting directions for further work are therefore to explore probing for other syntactic and semantic tasks, and to design control tasks for these other tasks.

### Self-Explaining Neural Networks: A review with extensions

For many applications, understanding why a predictive model makes a certain prediction can be of crucial importance. In the paper ‘‘Towards Robust Interpretability with Self-Explaining Neural Networks’', Melis et al. propose a model that takes interpretability into account by design. We study the reproducibilty and validity of the proposed framework. Several weaknesses of the approach are identified. Most notably, we find that the model rarely generates good explanations, and that performance is compromised more than reported by the authors when enforcing explanations to be stable. We put forward improvements to the framework that address these weaknesses in a principled way, and show that they enhance the interpretability of generated explanations.

### Encoding sentences with neural models

In this paper, we evaluate techniques for creating sentence representations on the sentiment classification task. We show that: word order is important; Tree-LSTMs outperform their recurrent counterparts, which agrees with findings of Tai et al. (2015); sentiment classification is harder when sentence length increases; and that supervising sentiment at the node level decreases overfitting, but does not lead to performance improvements. We present a method for framing the sentiment classification task as a regression problem, which has, to the best of our knowledge, not been done for this specific task before. Although this does not lead to performance improvements, it allows for a different and useful perspective of the problem. Interesting further work would be to analyze why the regression case does not work as well as expected, and to explore the benefits of both perspectives.

### Deep Representation Learning for Trigger Monitoring

We propose a novel neural network architecture called Hierarchical Latent Autoencoder to exploit the underlying hierarchical nature of the CMS Trigger System for data quality monitoring. Given the hierarchical cascaded design of the CMS Trigger System, the central idea is to learn the probability distribution of the Level 1 Triggers, modelled as the hidden archetypes, from the observable High Level Triggers. During evaluation, the learned parameters of the latent distribution can be used to generate a reconstruction probability score. We propose to use this probability metric for anomaly detection since a bounded number from zero to one has better interpretability in quantifying the severity of a fault. We selected a particular Level 1 Trigger and its corresponding High Level Triggers for our experiments. The results demonstrate that our architecture does reduce the reconstruction error on the test set from $9.35 \times 10^{-6}$ when using a vanilla Variational Autoencoder to $4.52 \times 10^{-6}$ when using our Hierarchical Latent Autoencoder. Hence, we successfully show that our custom designed architecture improves the reconstruction capability of variational autoencoders by utilizing the already existing hierarchical nature of the CMS Trigger System.

# Aman Hussain

into building things, taking risks and aesthetics.

### Interests

• Machine Learning
• Software Engineering
• Complex Systems

### Education

• MSc. in Artificial Intelligence, 2021

Universiteit van Amsterdam

• B.Tech in CSc. & Eng., 2019

Vellore Institute of Technology