Neel Nanda comments

Results 35 comments of


                                            Neel Nanda

[Proposal] Save and Load subgraph as dict

+1 to Arthur, I think this would be a cool thing for someone to build on top On Thu, 25 Apr 2024 at 16:55, Arthur Conmy ***@***.***> wrote: > CC...

HookedSAETransformer

Thanks for the feedback Nix! I think that there's cases where I want a stateful workflow, eg I have a single residual stream SAE that I want to attach to...

[Proposal] Add Support for Yi-6B and Yi-34B

I hear claims that it's basically just the LLaMA architecture! This would make this super easy woot. https://huggingface.co/01-ai/Yi-34B/discussions/11

[Bug Report] Fix `n_params` counts

IMO this should be just total parameters for simplicity and alignment with the Pythia suite. Who cares about LayerNorm On Tue, 14 Nov 2023, 8:09 pm ArthurConmy, ***@***.***> wrote: >...

[Bug Report] Load model to mutilple devices

It's a bit messy. In my opinion the crucial thing is that the model runs. So fixing bugs 1 and 2 seems important I'm in general kinda fine with some...

[Bug Report] `load_and_process_state_dict` handles LayerNorm folding poorly

Ah, so the intended behaviour here was that if model_cfg says LayerNormPre everything works as intended - you're loading an unfolded state dict into a model that expected things to...

[Bug Report] GPU Memory is leaked when HookedTransformer goes out of scope.

Oh man, sorry about that! (and the other inevitable bugs in the guts of core TL code...) I appreciate the heroic effort! On Sun, 4 Jun 2023, 6:05 pm Rusheb...

It already supports multiple devices, you need to pass the n_devices parameter to from_pretrained (or n_devices maybe?). The bottleneck on llama 2 70b was grouped query attention which is currently...

[Proposal] Demo and Tutorial on Patchscopes and "Patching + Generation"

Thanks for the proposal! I think it's an interesting technique, but in follow-up work we found the Gaussian noise seemed to break the model in some ways, so I wouldn't...

[Proposal] Guide to adding new models

I think this would be cool to exist! I unfortunately don't have capacity to make this myself. Looking at past PRs to eg add LLaMA or Gemma should give some...