Kohei Watanabe
Kohei Watanabe
There is a package called **future.apply** which provides parallelized apply-type functions. It seems that we can parallelize tokenization with `future_lapply()`. ```r require(quanteda) require(future.apply) plan(multiprocess) > corp #corp txt length(txt) [1]...
Following a bug #1960, we have to reconsider how we handle paddings. ``` dfmt[,""] # error dfm_select(dfmt, "") # works ``` R treats empty names as a special case according...
Since we no longer use rownames in the data.frame for docvars, `docvars(dfmt)` returns not information about docnames. We can return docnames as rownames. ```r # Current > rownames(docvars(dfm(c("a", "b")))) [1]...
If `corpus` is the object for the original texts, there shouldn't be `corpus_reshape()`. Even if texts are segmented into sentences or paragraphs, we can apply all preprocessing on the tokens...
I was thinking of adding functions that enhance user experience in v2.0, because the internal structural change stay unnoticed unless there is some "good" (or "bad") things for users. `print_with_docvar()`...
I recently learn that the TEI XML format is becoming popular in the linguistics community. In this format, texts are saved in small chunks with associated meta information (e.g. speaker),...
The EU manifesto example is incorrect, because Hungarian text, for example, is not in ISO-8859-1. https://readtext.quanteda.io/articles/readtext_vignette.html#reading-one-or-more-text-files However, it is tedious to specify encoding manually. Why not doing like this? `stri_enc_detect()`...
Hello, I found some issues in the Chinese simplified dictionary. I just list it here. 1. 'CF': [中非共和国, 中非*, 班吉]. The 中非 is a term used in a general context...
There are more languages need to be covered: - [x] English (master) - [x] Russian - [x] German - [x] Spanish - [x] Portuguese - [x] Italian - [x] French...
**quanteda** v1.5 added `nested_score = "dictionary"` to `tokens_lookup()`. If this function is used, it a new priority rule apply in dictionary lookup. ``` 'DM': [Commonwealth of Dominica, Commonwealth Dominican*, Roseau]...