tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

[docs] Whitespace

Open stevhliu opened this issue 6 months ago • 4 comments

Improves documentation for the Whitespace function (see https://github.com/huggingface/transformers/issues/38180 for more details)

cc @itazap

stevhliu avatar May 27 '25 22:05 stevhliu

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

You need to run python stub.py !

ArthurZucker avatar May 28 '25 07:05 ArthurZucker

Awesome! 💯

itazap avatar May 28 '25 14:05 itazap

Hmm, having a bit of trouble here? 😅

  • I had to make some changes because running python stub.py gives the error python: can't open file '/Users/stevhliu/tokenizers/stub.py': [Errno 2] No such file or directory.
  • Instead, I run the command python bindings/python/stub.py and it creates a new py_src/tokenizers/ directory (with a bunch of empty files) instead of writing the changes to the existing one.
  • So I modify stub.py to write the changes to "bindings/python/py_src/tokenizers/" but then it just overwrites and replaces everything in those files with a blank line which doesn't seem right.

stevhliu avatar May 28 '25 18:05 stevhliu

I'll merge this a fix in a follow-up stub.py is right even if it's just a newline.

Narsil avatar Jun 17 '25 14:06 Narsil

Oh you may have forgotten to rebuild tokenizers, stub.py looks at the binary and extracts the pyi from the built binary, so if your binary is outdated you may not see the new strings.

Narsil avatar Jun 17 '25 14:06 Narsil