YouTokenToMe icon indicating copy to clipboard operation
YouTokenToMe copied to clipboard

Unsupervised text tokenizer focused on computational efficiency

Results 42 YouTokenToMe issues
Sort by recently updated
recently updated
newest added

We are using big endian numbers under the hood. Rule's x, y and z are written by plane, not interleaved. Signed-off-by: Vadim Markovtsev

enhancement

SentencePiece implementation includes a text normalisation stage which is very useful in reducing the number of individual characters, handling non-printable characters and more. This feature may speed up the run...

enhancement

This PR resolves #65 #44, which implemented the custom tokens feature. Training BPE is intact, and custom tokens are just added to the model file. The tokens are used during...

Thanks for your work on this package but it seems it's falling behind in maintenance. It needs upgraded `Cython` as build dependency, and requires downgraded Python (3.9.16) in some cases....

reduced from a more complex example: ``` conda create -n tmp2 python=3.10 conda activate tmp2 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install youtokentome ``` then: `python -c 'import...

Visual studio c++ 14.0 error while installing. I have c++ 14.3 but still having issue with this.

is there a way to access the special tokens via the object of youtokenme.BPE(), it's in ```yytm.pyx``` file and it's object is created inside the train method of the same...