Tehran Institute for Advanced Studies (TEIAS)

/ Context Mixing in Transformers __ Hosein Mohebbi


Context Mixing in Transformers

Hosein Mohebbi

March 6, 2024
(16 Esfand, 1402)



The event will be held online.

Registration Deadline

March 5, 2024

You may need a VPN to start the talk.


Hosein Mohebbi

Ph.D. Candidate at Tilburg University


In both text and speech processing, variants of the Transformer architecture have become ubiquitous. The key advantage of this neural network topology lies in the modeling of pairwise relations between elements of the input (tokens): the representation of a token at a particular Transformer layer is a function of the weighted sum of the transformed representations of all the tokens in the previous layer. This feature of Transformers is known as ‘context mixing’ and understanding how it functions in specific model layers is crucial for tracing the overall information flow. In this talk, upon reviewing prior efforts to quantify context mixing, I will introduce Value Zeroing, and show that the token importance scores obtained through Value Zeroing offer better interpretations compared to previous analysis methods in terms of plausibility, faithfulness, and agreement with probing. Next, by applying Value Zeroing to models of spoken language, we will see how measures of context mixing can reveal striking differences between the behavior of encoder-only and encoder-decoder speech Transformers.


Hosein Mohebbi

Hosein Mohebbi is a PhD candidate at the Department of Cognitive Science and Artificial Intelligence at Tilburg University, Netherlands. He is part of the InDeep consortium project, doing research on interpretability of deep neural models for text and speech. He completed his Master’s in Artificial Intelligence at Iran University of Science and Technology, where his research revolved around the interpretation of pre-trained language models and the utilization of interpretability techniques to accelerate model inference. His research has been published in leading NLP venues such as ACL, EACL, and EMNLP, where he also regularly serves as a reviewer. He received an Outstanding Paper Award at EMNLP 2023. His contribution to the Computational Linguistics community extends to co-organizing BlackboxNLP (2023, 2024), a popular workshop focusing on analyzing and interpreting neural networks for NLP, and offering a tutorial on ‘Transformer-specific Interpretability’ at EACL 2024 conference in Malta.