Kohei Watanabe
Kohei Watanabe
DFMs with text unit being sentence often become too large for regular laptop computer, so we have to reduce memory usage by off-memory matrix such as [**bigstatsr**](https://github.com/privefl/bigstatsr). `big_randomSVD()` seems very...
Unit tests needed for - [ ] seed values (character vector, named-numeric vector, dictionary, and invalid values) - [ ] `predict`, `coef` and `summary` methods
Currently, we need to use both `tokens_replace()` and `stri_replace_all_fixed()` to replace "d.c" with "dc" or "u.s" with "us". Why we cannot do this by `tokens_substitute(toks, ".", "")`? This is similar...
See #2381.
Pass `print.tokens(...)` to `base::print()` to hide quotes around tokens. I think it is easier to read without quotes. ``` r require(quanteda) #> Loading required package: quanteda #> Package version: 4.0.2...
To address #2358 more thoroughly, we should have a C++ function that converts `list(interger(), interger(),...)` to `std::vector`. This function should remove negative values and `NA_INTEGER`. https://teuder.github.io/rcpp4everyone_en/240_na_nan_inf.html
The data storage is currently my Dropbox folder. It might be better to have a web server (or dedicated Dropbox account).
There are files that only works on @kbenoit 's machine `/construction`. They should be removed along with some text files.
``` r require(quanteda) #> Loading required package: quanteda #> Warning in .recacheSubclasses(def@className, def, env): undefined subclass #> "packedMatrix" of class "replValueSp"; definition not updated #> Warning in .recacheSubclasses(def@className, def, env):...
I really don't want to install dependencies for functions that I never use. How about taking this on-demand approach? https://github.com/quanteda/quanteda/blob/36c1ee584e663e155f4e9a27e42645585aeaa44c/R/convert.R#L249-L261