layer_norm_expressivity_role
layer_norm_expressivity_role copied to clipboard
Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)
On the Expressivity Role of LayerNorm in Transformers' Attention
This repository contains the code for reproduce the results from "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023) [PDF].

Setup
Make sure you have wandb.ai user and that you are logged into your machine.
Install the required python packages:
pip install -r requirements.txt
Gurobi is needed to find unselectable keys, and requires a license. See in here.
Hardware
In general, all experiments can run on either GPU or CPU.
Code Structure
- The
majoritysubdirectory contains the files needed to reproduce the results of the Majority task (Figure 1a, 1b, 2, 3). - The
unselectablesubdirectory contains the files needed to reproduce the results of the unselectable experiments (Figure 1c, 1d, 4, Table 1, 2).
Citation
On the Expressivity Role of LayerNorm in Transformers' Attention
@article{brody2023expressivity,
title={On the Expressivity Role of LayerNorm in Transformers' Attention},
author={Brody, Shaked and Alon, Uri and Yahav, Eran},
journal={arXiv preprint arXiv:2305.02582},
year={2023}
}