llm-interpretability topic

List llm-interpretability repositories

llama3_interpretability_sae

624
Stars
36
Forks
624
Watchers

A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.