mechanistic-interpretability topics

Mechanistically interpretable neurosymbolic AI (Nature Comput Sci 2024): losslessly compressing NNs to computer code and discovering new algorithms which generalize out-of-distribution and outperform...

pauljblazek

deep-distilling

deep-learning

explainable-ai

machine-learning

steering-vectors

61

Stars

5

Forks

Watchers

Steering vectors for transformer language models in Pytorch / Huggingface

steering-vectors

ai

gpt

huggingface

mechanistic-interpretability

causalgym

40

Stars

5

Forks

Watchers

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

aryamanarora

benchmark

causality

interpretability

mechanistic-interpretability

llm-latent-language

52

Stars

12

Forks

Watchers

Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".

epfl-dlab

llama2

llm

mechanistic-interpretability

multilingual-nlp