Easy-Transformer
Easy-Transformer copied to clipboard
Add tests + better docs for tokenization methods
Add tests that the tokenization methods work (to_tokens, to_string, to_str_tokens, get_token_position)
Go through the documentation and clarify things that are unclear (this is hard for me to do, so even just having someone new to the library flag confusions is helpful!) The behaviour of prepend_bos
is the main confusion. Docs can be copied from https://colab.research.google.com/github/neelnanda-io/TransformerLens/blob/v2/Main_Demo.ipynb#scrollTo=GUSyRfQuKmHU
I would like to work on this next weekend (unless someone already started).
Updating docs is mostly about the docscrings in https://github.com/neelnanda-io/TransformerLens/blob/main/transformer_lens/HookedTransformer.py#L418 (rendered as https://neelnanda-io.github.io/TransformerLens/transformer_lens.html#transformer_lens.HookedTransformer.HookedTransformer.to_tokens), right? If there are external docs to update (copy from Main_Demo notebook to somewhere else that is not the docsctring), please let me know.
That's write. Docstrings and sphinx. It would be nice for examples to show up nicely I'm sphinx.
On Wed, May 10, 2023, 12:46 AM Peter Hozák @.***> wrote:
I would like to work on this next weekend (unless someone already started).
Updating docs is mostly about the docscrings in https://github.com/neelnanda-io/TransformerLens/blob/main/transformer_lens/HookedTransformer.py#L418 (rendered as https://neelnanda-io.github.io/TransformerLens/transformer_lens.html#transformer_lens.HookedTransformer.HookedTransformer.to_tokens), right? If there are external docs to update (based by explanation from Main_Demo notebook), please let me know.
— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/100#issuecomment-1540272386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQPMYZZUWJOWHEDE4QMQTDTXFJKFRANCNFSM6AAAAAATDIGMSY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Notes to self:
- I was not able to install the dev version on either my Windows or Mac OS M2, either DevContainer or poetry => using GH Codespaces on auto-installed Ubuntu 20.04.4 LTS, Python: 3.8.10
- state of tests on
main
before adding anything:- unit:
231 passed, 2 skipped, 1 warning in 14.29s
- acceptance bailed out after 60% completion, last line:
tests/acceptance/test_transformer_lens.py::test_model[opt-125m-6.159054279327393] make: *** [makefile:19: acceptance-test] Killed
- unit:
Not updating docstrings yet, I will contact people on Slack to schedule some chat/call later about:
- docs on
main
branch are not up to date, running code from docs/README.md modifies a bunch files even without any changes to docstrings from me - there are changes to the docstings in https://github.com/neelnanda-io/TransformerLens/pull/274 => best to wait for merge of those before doing minor formatting changes
- purpose of
prepend_bos
is actually quite clear to me as a survivor of UTF-8 text file encodings with/without BOM 🤷 .. I will need to sleep on it a few more times for ideas - probably a link from all methods that haveprepend_bos
to some standalone section in the docs about this special token 🤔
@Aprillion Sorry to hear it's been so difficult! Have you been keeping track of each of the different challenges with each install? Maybe some of it is solvable on our end...
Let's chat on slack about meeting :)
@Aprillion I got it working on my M2 with poetry rather than dev containers after a lot of fiddly work, IIRC the biggest issue was that I had to downgrade from Python 3.11 to 3.10 due to an apple silicon incompatibility with some package (maybe pytorch?)