Arthur Conmy
Arthur Conmy
For completeness, the generator can then be loaded with ``` from model import Generator G = Generator(1024, 512, 8) G.load_state_dict(torch.load("stylegan2-ffhq-config-f.pt")['g_ema']) ```
For me (on a GPU and CPU) even GPT-2 attention patterns fails `torch.testing.assert_allclose` with `rtol=atol=1e-6`. Though yeah `1e-3` is pretty bad. @UFO-101 you're using MacOS, are you using CPU or...
@nix-apollo which version of TL are you on? We replaced the -1e5 issue [here](https://github.com/neelnanda-io/TransformerLens/pull/366) and [here](https://github.com/neelnanda-io/TransformerLens/pull/389)
Another bit of evidence: @bryce13950 states the current failing CI test is due to a Pythia model: > The model specifically that is causing this fail is the model "EleutherAI/pythia-70m",...
@ed1d1a8d ^the egregious Llama-2 errors are fixed, we think! Now 1e-4 errors, only. [We are working on](https://github.com/neelnanda-io/TransformerLens/pull/454/files#diff-d3f09bb699c3b05afb6d0cb1102d441eefc4d0f6c2aabdf96ff7d888c43c60aa) the dull task of porting on TL functions to match HF exactly which...
I think that this should not be merged, but we should reopen and merge [this PR](https://github.com/neelnanda-io/TransformerLens/pull/315) that fixes the inconsistency that @clarenceluo78 noticed. Unless @neelnanda-io can explain why the behavior...
I'm excited for people to work on adding new architectures to TransformerLens! :) However, your figure is not the most important figure in that paper. None of the models use...
The google colab is using this older branch of the TransformerLens code: https://github.com/neelnanda-io/TransformerLens/tree/clean-transformer-demo (way back when the library was called EasyTransformer). You should either use that branch locally, or not...
@erlebach the relevant line from the colab is ``` %pip install git+https://github.com/neelnanda-io/Easy-Transformer.git@clean-transformer-demo ``` this doesn't install the latest TransformerLens, but a specific branch. I've made this an issue for the...
CC @UFO-101 who is building a general automated interp library. In my opinion it's better to build a library on top of TL, rather than inside TL. What advantages would...