llm-localization
llm-localization copied to clipboard
Repository for "The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units" Paper
The LLM Language Network
A Neuroscientific Approach for Identifying Causally Task-Relevant Units
Website: https://llm-language-network.epfl.ch
Authors: Badr AlKhamissi, Greta Tuckute, Antoine Bosselut*, Martin Schrimpf*
* Equal Supervision
Setup
- Create conda environment:
conda create -n llm-loc python=3.10 - Activate environment:
conda activate llm-loc - Install packages:
pip install -r requirements.txt
Repository Structure
Root Directory
-
localize.py:
The main script for localizing selective units within a specified model. -
datasets.py:
Handles localizer stimuli loading and processing. -
utils.py:
Contains utility functions used throughout the repository. -
model_utils.py:
Provides model-related utilities. -
generate_lesion.py:
Generates text after applying lesions (ablations) to the localized units.
Folders
-
stimuli/:
Contains the localizer stimuli required for identifying selective units. These scripts to download the stimuli can be found inscripts. -
scripts/:
Contains bash scripts, includingdownload_lang_stimuli.shanddownload_tom_stimuli.sh, which is used to download the localizer stimuli into thestimulifolder, as well as example scripts to run localization. -
cache/:
Stores the localized units' masks, which can be reused or overwritten based on the script options.
Arguments Description
-
--model-name(str, required):
Specifies the HuggingFace model name that will be localized. This is a required argument.
Example:--model-name meta-llama/Llama-3.2-1B -
--percentage(float, optional):
Defines the percentage of units to localize.
Example:--percentage 1(localizes 1% of the units) -
--localize-range(str, optional, default:"100-100"):
Determines the percentile range of units to localize."100-100": Localizes the top-selective units (most selective)."0-0": Localizes the least-selective units."x-y": Localize random units within that percentile range wherey > x.
Example:
--localize-range 80-90 -
--network(str, optional, default:"language", choices:["language", "theory-of-mind", "multiple-demand]):
Specifies the network type to localize.
Example:--network language -
--pooling(str, optional, default:"last-token", choices:["last-token", "mean"]):
Defines the method for token aggregation when localizing units:"last-token": Uses the last token of the sequence."mean": Averages over all tokens in the sequence.
Example:--pooling mean
-
--num-units(int, optional):
Specifies the exact number of units to localize. If--percentageis provided, it overrides this argument.
Example:--num-units 1024 -
--seed(int, optional, default:42):
Sets the random seed for reproducibility of results.
Example:--seed 123 -
--device(str, optional):
Specifies the device to use (e.g.,"cpu","cuda:0"). If not provided, the device will be automatically selected.
Example:--device cuda:0 -
--untrained(flag, optional):
Use this flag to localize units on an untrained version of the model. No arguments needed for this flag.
Example:--untrained(applies to untrained models) -
--overwrite(flag, optional):
Use this flag to overwrite an existing mask if a cached version is found.
Example:--overwrite
Usage Instructions
-
Download Localizer Stimuli: Run the following command to download the required language stimuli:
bash scripts/download_lang_stimuli.sh bash scripts/download_tom_stimuli.shThe files will be saved in the
stimulidirectory. -
Run the Localization Script: Use the following command to localize the top language-selective units for your chosen model:
python localize.py --model-name <model_name> --percentage <percentage> --network <network> --localize-range 100-100 --pooling last-tokenThis will create a mask in the
cachedirectory, which can be used to ablate or extract the identified network-selective units. See thescriptsfolder for examples. -
Run the Generation with Ablation Script: See the example in
scripts/generate_lesion.sh
Abstract
Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature, such as logical reasoning and social inference. In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing. We here ask whether similar specialization for language emerges in LLMs. We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience. We then establish the causal role of these units by demonstrating that ablating LLM language-selective units -- but not random units -- leads to drastic deficits in language tasks. Correspondingly, language-selective LLM units are more aligned to brain recordings from the human language system than random units. Finally, we investigate whether our localization method extends to other cognitive domains: while we find specialized networks in some LLMs for reasoning and social capabilities, there are substantial differences among models. These findings provide functional and causal evidence for specialization in large language models, and highlight parallels with the functional organization in the brain.
Citation
@inproceedings{alkhamissi-etal-2025-llm-language-network,
title = "The {LLM} Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units",
author = "AlKhamissi, Badr and
Tuckute, Greta and
Bosselut, Antoine and
Schrimpf, Martin",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.naacl-long.544/",
doi = "10.18653/v1/2025.naacl-long.544",
pages = "10887--10911",
ISBN = "979-8-89176-189-6",
}