Joseph Bloom issues

Results 41 issues of


                                            Joseph Bloom

[Bug Report] GatedMLP not in docs.

**Describe the bug** We added gated mlps when we provided LLama support (https://github.com/neelnanda-io/TransformerLens/commit/3d03ca5081ff0b7a920ffe7830e2c3da0e6e9d07) however we didn't update the docs or add tests specifically for the GatedMLP component. It's on me...

[Bug Report] Can't add hook to pretrained model: AssertionError: Cannot add hook blocks.0.hook_q_input if use_split_qkv_input is False

**Describe the bug** Attribution patching demo: **Code example** Please try to provide a minimal example to reproduce the bug. Error messages and stack traces are also helpful. (see patching section...

bug

help wanted

Has anyone trained a tuned lens on Gemma-2b or other Gemma models?

Complete Embedding visualizations

- [x] 1. Dot product between each output action. - [x] 2. Dot product between each input action. - [ ] 3. Dot product between each time embedding. - [x]...

Shapley Values on Attention Heads or Causal Edges Via Ablation

Basic concept is that we can sample from which heads we actually compute randomly in order to see which matter. Shapley values are usually computed over all subsets of heads....

Complete QK/OV Circuit visualizations

QK - [ ] State to Action. OV - [] Fix head selection (default to all) - [ ] Find a way to automatically find axes in the OV circuit...

Fix Ablation Tool

- [ ] Use t-lens naming scheme - [ ] Enable arbitrary combinations of heads and MLPs

Write Up Analysis of Memory Env Solution.

- [x] Psychological eval - [ ] Activation Patching for instruction and RTG. -> try to explain - [ ] Work out how to tackle targets (patching same object multiple...

Reverse Logit Lense

https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens https://colab.research.google.com/drive/1MjdfK2srcerLrAJDRaJQKO0sUiZ-hQtA?usp=sharing pip install git+https://github.com/finetuneanon/transformers/@gpt-neo-localattention

Mega Card: Improve Analysis App in various ways to facilitate better interpretability analysis of the new models

## Analysis features ### Static Composition - [x] Make composition maps - [x] Replace composition scores with strip plots? - [ ] Create a meta-composition score. Something that measures total...

enhancement