Joseph Bloom
Joseph Bloom
This worked for me! > For another solution, you might use `match` keyword, and does not need to create more function to handle error > > ``` > // TODO:...
@sheikheddy @neelnanda-io What's the plan here? Do we need an interactive visualization or will something else do?
I've assigned myself to this as I've started trying to debug this. It seems like the most reasonable culprit is deviation in `calculate_sin_cos_rotary` but fixing that doesn't fix compounding deviation...
@bryce13950 Could do this help with https://github.com/neelnanda-io/TransformerLens/issues/213?
Thanks for the feedback. I think that sounds good :) We can leave this issue open and mention that if people feel strongly about it, they can make a case....
@ArthurConmy Any ideas here?
Logging levels could definitely help with this, we could add a logger into the package. I was thinking we should move all the tutorials into a folder and actually run...
That's write. Docstrings and sphinx. It would be nice for examples to show up nicely I'm sphinx. On Wed, May 10, 2023, 12:46 AM Peter Hozák ***@***.***> wrote: > I...
@Aprillion Sorry to hear it's been so difficult! Have you been keeping track of each of the different challenges with each install? Maybe some of it is solvable on our...
d_heads is the dimension of the attention heads in multi-headed attention and n_heads is the number of heads per layer. Non-zero d_mlp is a bug I guess, you can turn...