Olivier Delmarcelle issues

Results 8 issues of


                                            Olivier Delmarcelle

Overhead when exporting to PSOCK cluster

**Describe the bug** I believe that a significant (and avoidable) overhead is present when using future_lapply inside another function. I think this might be related to the exportation of `...future.FUN`,...

Allows adding customized rules to the ICU tokenizer

In #896, the usage of stringi's RBBI was considered to improve the tokenization of URLs and tags (following [gagolews/stringi#263](https://github.com/gagolews/stringi/issues/263)). I believe that RBBI rules are also useful for users, as...

Output of fcm(x, context = "window", count = "boolean")

Hello, The result of fcm with the settings `context = "window"` and `count = "boolean"` looks a bit odd to me. I'm writing this as a question as I'm not...

A possible way to deal with elisions

The current tokenization leaves French elisions attached to their words. This causes some sentiment words to not be identified when computing sentiment. For example, "l'abandon" is not identified as negative...

Denominator for "proportionalPol" sentiment computation

It always tickles me that compute_sentiment can yield values outside the [-1;1] range when using the "proportionalPol" method. ``` r library(sentometrics) sample_text id word_count LM_sample #> 1: C'est un abandon...

Danger when using `tokens` from a un-ordered corpus

I like tokenizing text by myself before using compute_sentiment. My usual framework is to start from a quanteda::corpus, from which I create a sento_corpus and a quanteda::tokens object. I just...

Scary cleaning V2

The result of a cleaning with BFG repo cleaner. The cleaning correct a large number of commits in the past.

Regarding the frequency used for scoring sloppy phrase queries.

### Description I've been using Lucene (through OpenSearch) for querying and scoring human-written documents (!= logs). I often use sloppy phrase queries to handle languages variations which express similar meaning....

type:enhancement