llm-interpretability topic

List llm-interpretability repositories

llama3_interpretability_sae

624

Stars

Forks

624

Watchers

A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.

PaulPauls

feature-extraction

feature-steering

llama3

llm-interpretability