proustr
proustr copied to clipboard
tokenization in French
Hi Colin,
I'm using tidytext for tokenization, but have some problems with texts in French. For instance "L'achat" or "j'ai" are not separated as they should be. In an issue regarding tidytext you mentioned that you were working on a tokenizer that would work well for French and I got the impression that it was intended for the proustr package. Can you tell me more about it?
Cheers, Lise