Alan Cooney
Alan Cooney
You were probably behind a firewall, that's preventing an install of `rlax` directly from GitHub. In the latest version it comes from PyPI instead so this should no longer be...
For future reference, add this as another graphql file to import. Note you may need to add others - these are just the ones we use at Skyhook. ``` #...
You need to enable sourcemaps: ```json { "transform": { "^.+\\.tsx?$": [ "esbuild-jest", { "sourcemap": true } ] } } ```
This would be very useful for https://github.com/deepmind/meltingpot as well, so that it can also be provided as a package
Thanks for starting on this - it seems useful and I agree that it should be it's own file (probably just for DLA as it'll become quite large once it's...
Seems sensible (and easier to track changes)!
@clarenceluo78 I think the key point here is that a common thing done in TransformerLens is folding the layer norm weights into the next linear layer. See https://github.com/neelnanda-io/TransformerLens/blob/main/further_comments.md#what-is-layernorm-folding-fold_ln for details....
Seems v. useful for sparse autoencoder training. Docs here - https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html#conclusion - in case anyone wants to take this (I'll pick it up at some point if no-one does).
> I'd be quite keen to make a start on this soon, @alan-cooney have you made a start already? I haven't yet so please feel free to!
Note: Will return to after Mistral PR is merged