Easy-Transformer icon indicating copy to clipboard operation
Easy-Transformer copied to clipboard

Add tests + better docs for tokenization methods

Open neelnanda-io opened this issue 2 years ago • 5 comments

Add tests that the tokenization methods work (to_tokens, to_string, to_str_tokens, get_token_position)

Go through the documentation and clarify things that are unclear (this is hard for me to do, so even just having someone new to the library flag confusions is helpful!) The behaviour of prepend_bos is the main confusion. Docs can be copied from https://colab.research.google.com/github/neelnanda-io/TransformerLens/blob/v2/Main_Demo.ipynb#scrollTo=GUSyRfQuKmHU

neelnanda-io avatar Dec 19 '22 11:12 neelnanda-io

I would like to work on this next weekend (unless someone already started).

Updating docs is mostly about the docscrings in https://github.com/neelnanda-io/TransformerLens/blob/main/transformer_lens/HookedTransformer.py#L418 (rendered as https://neelnanda-io.github.io/TransformerLens/transformer_lens.html#transformer_lens.HookedTransformer.HookedTransformer.to_tokens), right? If there are external docs to update (copy from Main_Demo notebook to somewhere else that is not the docsctring), please let me know.

Aprillion avatar May 09 '23 14:05 Aprillion

That's write. Docstrings and sphinx. It would be nice for examples to show up nicely I'm sphinx.

On Wed, May 10, 2023, 12:46 AM Peter Hozák @.***> wrote:

I would like to work on this next weekend (unless someone already started).

Updating docs is mostly about the docscrings in https://github.com/neelnanda-io/TransformerLens/blob/main/transformer_lens/HookedTransformer.py#L418 (rendered as https://neelnanda-io.github.io/TransformerLens/transformer_lens.html#transformer_lens.HookedTransformer.HookedTransformer.to_tokens), right? If there are external docs to update (based by explanation from Main_Demo notebook), please let me know.

— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/100#issuecomment-1540272386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQPMYZZUWJOWHEDE4QMQTDTXFJKFRANCNFSM6AAAAAATDIGMSY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jbloomAus avatar May 09 '23 21:05 jbloomAus

Notes to self:

  • I was not able to install the dev version on either my Windows or Mac OS M2, either DevContainer or poetry => using GH Codespaces on auto-installed Ubuntu 20.04.4 LTS, Python: 3.8.10
  • state of tests on main before adding anything:
    • unit: 231 passed, 2 skipped, 1 warning in 14.29s
    • acceptance bailed out after 60% completion, last line: tests/acceptance/test_transformer_lens.py::test_model[opt-125m-6.159054279327393] make: *** [makefile:19: acceptance-test] Killed

Not updating docstrings yet, I will contact people on Slack to schedule some chat/call later about:

  • docs on main branch are not up to date, running code from docs/README.md modifies a bunch files even without any changes to docstrings from me
  • there are changes to the docstings in https://github.com/neelnanda-io/TransformerLens/pull/274 => best to wait for merge of those before doing minor formatting changes
  • purpose of prepend_bos is actually quite clear to me as a survivor of UTF-8 text file encodings with/without BOM 🤷 .. I will need to sleep on it a few more times for ideas - probably a link from all methods that have prepend_bos to some standalone section in the docs about this special token 🤔

Aprillion avatar May 13 '23 15:05 Aprillion

@Aprillion Sorry to hear it's been so difficult! Have you been keeping track of each of the different challenges with each install? Maybe some of it is solvable on our end...

Let's chat on slack about meeting :)

jbloomAus avatar May 14 '23 08:05 jbloomAus

@Aprillion I got it working on my M2 with poetry rather than dev containers after a lot of fiddly work, IIRC the biggest issue was that I had to downgrade from Python 3.11 to 3.10 due to an apple silicon incompatibility with some package (maybe pytorch?)

luciaquirke avatar May 15 '23 07:05 luciaquirke