nlp_primitives icon indicating copy to clipboard operation
nlp_primitives copied to clipboard

Support Unicode

Open sbadithe opened this issue 2 years ago • 1 comments

As a user, I wish NLP Primitives had the ability to handle unicode text.

Currently, Unicode text is not correctly handled by regexes in nlp_primitives.

For example, Àbc is not recognized as a title word by TitleWordCount (Abc is).

sbadithe avatar Jul 26 '22 17:07 sbadithe

@sbadithe Is it possible to make a pytest fixture and have it be used by all the NL primitives? That way if we add more NL primitives in the future, we can make sure they support unicode.

gsheni avatar Jul 26 '22 17:07 gsheni