codebook-features
codebook-features copied to clipboard
Sparse and discrete interpretability tool for neural networks
In `codebook_features/models.py`, I can see a method for attaching codebooks to each attention block's query, key and value vectors: https://github.com/taufeeque9/codebook-features/blob/a37ea8fe7d4d39298aaea042a078d09401396edc/codebook_features/models.py#L1439C1-L1460C36 After training a model with these codebooks attached though, it...
The [rotary embedding issue](https://github.com/neelnanda-io/TransformerLens/issues/385) is transformer_lens breaks the TinyStories codebook models. The issue was introduced in version 1.8.0 and so we have frozen transformer_lens' version at 1.7.0. Bump the version...
Currently, there are two ways to format keys in codebook-related dictionaries: 1) "layer{x}_{cb_at}_gcb{y}" (adv name), and 2) "layer{x}_head{y}" (base name). The Second was introduced for convenience but shouldn't be used...