proustr icon indicating copy to clipboard operation
proustr copied to clipboard

tokenization in French

Open lvaudor opened this issue 4 years ago • 0 comments

Hi Colin,

I'm using tidytext for tokenization, but have some problems with texts in French. For instance "L'achat" or "j'ai" are not separated as they should be. In an issue regarding tidytext you mentioned that you were working on a tokenizer that would work well for French and I got the impression that it was intended for the proustr package. Can you tell me more about it?

Cheers, Lise

lvaudor avatar Dec 14 '21 10:12 lvaudor