python-bpe
python-bpe copied to clipboard
Byte Pair Encoding for Python!
Add encoding, ensure_ascii and indent parameter for Vietnamese language use case.
The pythonpsum website is no longer functional nor searchable.
I want to instal bpe by python2.7, but mypy requires python3.5 later
How to use this library to generate an encoder.json file such as the ones used for GPT-2 model ?
The tokenizer makes for a better default (faster and saner). Targets #10
Simple example would be to import `word_tokenize` from `tok` instead of from `nltk`. See: https://github.com/kootenpv/tok
If a pair is the best in frequency, their combination should reduce the count of both single tokens respectively, which may result in the single tokens' difficulty in keeping high...