alvations
alvations
While this is a good speedup, it adds additional dependencies. I'll avoid it and users looking for speedier alternatives should be able to find `StringZilla` or other.
Thanks @jeslinpjames! The CI/CD were failing and it was hard to debug these. After the patch on #3280 and #3274, the files have black applied. Feel free to reopen this...
I think it is indeed a bug, another user found that overriding the __class__ indirectly is a workaround the missing normalizer after modification. https://stackoverflow.com/questions/78612251/how-do-we-add-modify-the-normalizer-in-a-pretrained-huggingface-tokenizer/78624238#78624238
The test would still fail at times for sure if we don't do something to the non-deterministic output checks. I'll dig the specific doctest or unittest up
Thanks for the PR! Pointing to HPLT is correct. P/S: Though sacremoses and some nltk tokenizers are written in the same style, esp the Penn Treebank tokenizer part, it wasn't...