KnowledgeCircuits
KnowledgeCircuits copied to clipboard
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
Knowledge Circuits
Knowledge Circuits in Pretrained Transformers
Table of Contents
- 🌟Overview
- 🔧Installation
- 📚Get the circuit
- 🧐Analyze Component
- 🌻Acknowledgement
- 🚩Citation
🌟Overview
This work aims to build the circuits in the pretrained language models that are responsible for the specific knowledge and analyze the behavior of these components.
🔧Installation
The filtered data for each kind of model is at here. Please download it and put it in the data folder.
Build the environement:
conda create -n knowledgecircuit python=3.10
pip install -r requirements.txt
❗️The code may fail under torch 2.x.x. We recommend torch 1.x.x
📚Get the circuit
Just run the following commond:
cd acdc
sh run.sh
Here is an example to run the circuit for the country_capital_city
in GPT2-Medium
.
MODEL_PATH=/path/to/the/model
KT=factual
KNOWLEDGE=country_capital_city
NUM_EXAMPLES=20
MODEL_NAME=gpt2-medium
python main.py --task=knowledge \
--zero-ablation \
--threshold=0.01 \
--device=cuda:0 \
--metric=match_nll \
--indices-mode=reverse \
--first-cache-cpu=False \
--second-cache-cpu=False \
--max-num-epochs=10000 \
--specific-knowledge=$KNOWLEDGE \
--num-examples=$NUM_EXAMPLES \
--relation-reverse=False \
--knowledge-type=$KT \
--model-name=$MODEL_NAME \
--model-path=$MODEL_PATH
You would get the results in acdc/factual_results/gpt2-medium
.
🧐Analyze component
Run the component.ipynb in notebook.
🌻Acknowledgement
We thank for the project of transformer_lens, ACDC and LRE. The code in this work is built on top of these three projects' codes.
🚩Citation
Please cite our repository if you use Knowledge Circuit in your work. Thanks!
@article{DBLP:journals/corr/abs-2405-17969,
author = {Yunzhi Yao and
Ningyu Zhang and
Zekun Xi and
Mengru Wang and
Ziwen Xu and
Shumin Deng and
Huajun Chen},
title = {Knowledge Circuits in Pretrained Transformers},
journal = {CoRR},
volume = {abs/2405.17969},
year = {2024},
url = {https://doi.org/10.48550/arXiv.2405.17969},
doi = {10.48550/ARXIV.2405.17969},
eprinttype = {arXiv},
eprint = {2405.17969},
timestamp = {Fri, 21 Jun 2024 22:39:09 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2405-17969.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}