Add wrapper integrating HookedTransformer with Google's Learning Interpretability Tool (LIT)

Open neelnanda-io opened this issue 3 years ago • 0 comments

Google have a very cool-looking tool for (mostly non-MI) interpretability of language models, called LIT. It seems designed to be framework agnostic, and to be able to take a wrapper around many kinds of models, with functions to enable various LIT functions. I want to add a wrapper to HookedTransformer such that it can integrate with LIT, ideally for as many LIT functions as possible.

The MVP in mind here would just be a Colab which gets LIT to work with TransformerLens, and maybe showing some things you can do with it. I'm not sure whether this kind of integration should actually be merged into the library, but I'd love for a small demo to exist!

Dec 24 '22 02:12 neelnanda-io