python-bpe
python-bpe copied to clipboard
Consider using `tok` as tokenizer; faster and more customizable
Simple example would be to import word_tokenize
from tok
instead of from nltk
.
See: https://github.com/kootenpv/tok
Hey @kootenpv, thanks for the suggestion! Care to put up a PR?