nlp_primitives
nlp_primitives copied to clipboard
Support Unicode
As a user, I wish NLP Primitives had the ability to handle unicode text.
Currently, Unicode text is not correctly handled by regexes in nlp_primitives
.
For example, Àbc
is not recognized as a title word by TitleWordCount
(Abc
is).
@sbadithe Is it possible to make a pytest fixture and have it be used by all the NL primitives? That way if we add more NL primitives in the future, we can make sure they support unicode.