python-bpe icon indicating copy to clipboard operation
python-bpe copied to clipboard

Byte Pair Encoding for Python!

Results 7 python-bpe issues
Sort by recently updated
recently updated
newest added

Add encoding, ensure_ascii and indent parameter for Vietnamese language use case.

The pythonpsum website is no longer functional nor searchable.

I want to instal bpe by python2.7, but mypy requires python3.5 later

How to use this library to generate an encoder.json file such as the ones used for GPT-2 model ?

The tokenizer makes for a better default (faster and saner). Targets #10

Simple example would be to import `word_tokenize` from `tok` instead of from `nltk`. See: https://github.com/kootenpv/tok

If a pair is the best in frequency, their combination should reduce the count of both single tokens respectively, which may result in the single tokens' difficulty in keeping high...