Kohei Watanabe

Results 81 issues of Kohei Watanabe

DFMs with text unit being sentence often become too large for regular laptop computer, so we have to reduce memory usage by off-memory matrix such as [**bigstatsr**](https://github.com/privefl/bigstatsr). `big_randomSVD()` seems very...

enhancement

Unit tests needed for - [ ] seed values (character vector, named-numeric vector, dictionary, and invalid values) - [ ] `predict`, `coef` and `summary` methods

Currently, we need to use both `tokens_replace()` and `stri_replace_all_fixed()` to replace "d.c" with "dc" or "u.s" with "us". Why we cannot do this by `tokens_substitute(toks, ".", "")`? This is similar...

Pass `print.tokens(...)` to `base::print()` to hide quotes around tokens. I think it is easier to read without quotes. ``` r require(quanteda) #> Loading required package: quanteda #> Package version: 4.0.2...

To address #2358 more thoroughly, we should have a C++ function that converts `list(interger(), interger(),...)` to `std::vector`. This function should remove negative values and `NA_INTEGER`. https://teuder.github.io/rcpp4everyone_en/240_na_nan_inf.html

The data storage is currently my Dropbox folder. It might be better to have a web server (or dedicated Dropbox account).

There are files that only works on @kbenoit 's machine `/construction`. They should be removed along with some text files.

``` r require(quanteda) #> Loading required package: quanteda #> Warning in .recacheSubclasses(def@className, def, env): undefined subclass #> "packedMatrix" of class "replValueSp"; definition not updated #> Warning in .recacheSubclasses(def@className, def, env):...

I really don't want to install dependencies for functions that I never use. How about taking this on-demand approach? https://github.com/quanteda/quanteda/blob/36c1ee584e663e155f4e9a27e42645585aeaa44c/R/convert.R#L249-L261