Bryce Meyer
Bryce Meyer
@JasonGross Do you still have a bit to wrap up on this? If you need any help getting this out the door let me know. If you could add a...
Great! I should be able to review this, and get it into a release early next week
Hey! Sorry for not getting to this earlier. I got pulled away to wrap up a couple things. Looking at it now. Will let you know if anything odd pops...
I am going to go ahead and merge this. The implementation seems to be a bit inaccurate, but I don't think that has anything to do with what has been...
I did a quick little experiment in the specific architecture weight conversions to see if it was sufficient for tying the weights when needed https://github.com/TransformerLensOrg/TransformerLens/tree/experiment-gemma-weight-tying. This is something that needs...
@frances720 Sorry for the late reply! It appears that you may be trying to write your branch to the TransformerLens repo? You need to make your PR from your fork....
This has been resolved in a recent release
This is a lot more complicated than it seems. TransformerLens currently supports 185 models. If you look through the source code of transformers, you will see that every model architecture...
Which models do you want to use? I have helped a couple people get them up and running, and I am sure they would be happy to share their code...
Unfortunately, at this moment in time there is no integration of activation cache in the generate function. I don't see any reason why we can't add that as an option,...