proustr icon indicating copy to clipboard operation
proustr copied to clipboard

Tools for Natural Language Processing in French and texts from Marcel Proust's collection "A La Recherche Du Temps Perdu"

Results 10 proustr issues
Sort by recently updated
recently updated
newest added

Hi Colin, I'm using tidytext for tokenization, but have some problems with texts in French. For instance "L'achat" or "j'ai" are not separated as they should be. In [an issue...

Sentiment analysis might work better on stemmed text. Might be an option in `proust_sentiments(stem=TRUE)ˋ

Check for this punctuation "«»““”„‟≪≫《》〝〞〟" and 'ʻʼʽ٬‘’‚‛

`pr_stem` should let the user choose between several stemming methods (SnowballC, hunspell)

enhancement

Calculation of word rariety, based on : http://www.lexique.org/telLexique.php

Both the stemmer should be implemented, with an arg specifying which one to choose.

enhancement

https://www.rocq.inria.fr/alpage-wiki/tiki-index.php?page=CorpusSequoia http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php http://lia.univ-avignon.fr/chercheurs/bechet/download_fred.html http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

enhancement

Some terms might be mispelled, and appear once or twice in the dataset, and should be put back to the right spot in the table. `pr_spell_*` or so would take...

enhancement

Allow custom regex in proustr `pr_detect_*`

enhancement

Should `pr_detect_*` have an english version?

enhancement
question