Neel Nanda
Neel Nanda
+1 to Arthur, I think this would be a cool thing for someone to build on top On Thu, 25 Apr 2024 at 16:55, Arthur Conmy ***@***.***> wrote: > CC...
Thanks for the feedback Nix! I think that there's cases where I want a stateful workflow, eg I have a single residual stream SAE that I want to attach to...
I hear claims that it's basically just the LLaMA architecture! This would make this super easy woot. https://huggingface.co/01-ai/Yi-34B/discussions/11
IMO this should be just total parameters for simplicity and alignment with the Pythia suite. Who cares about LayerNorm On Tue, 14 Nov 2023, 8:09 pm ArthurConmy, ***@***.***> wrote: >...
It's a bit messy. In my opinion the crucial thing is that the model runs. So fixing bugs 1 and 2 seems important I'm in general kinda fine with some...
Ah, so the intended behaviour here was that if model_cfg says LayerNormPre everything works as intended - you're loading an unfolded state dict into a model that expected things to...
Oh man, sorry about that! (and the other inevitable bugs in the guts of core TL code...) I appreciate the heroic effort! On Sun, 4 Jun 2023, 6:05 pm Rusheb...
It already supports multiple devices, you need to pass the n_devices parameter to from_pretrained (or n_devices maybe?). The bottleneck on llama 2 70b was grouped query attention which is currently...
Thanks for the proposal! I think it's an interesting technique, but in follow-up work we found the Gaussian noise seemed to break the model in some ways, so I wouldn't...
I think this would be cool to exist! I unfortunately don't have capacity to make this myself. Looking at past PRs to eg add LLaMA or Gemma should give some...