In this paper, we investigate the application of attention-based hierarchical LSTM architecture variants to the problem of question classification. Question classification is often the first and important step for any Question-Answering system. We show that although the hierarchical design greatly improves performance over vanilla LSTMs, adding an attention mechanism only results in slight improvement. Then, we change perspective to probabilistically model the question dataset using discrete latent variables in order to see if the given coarse-level categories are re-discovered. While some latent structure is learned, it isn’t the one we expected. We consider the possible reasons and suggest future improvements.