Kohei Watanabe
Kohei Watanabe
I have no good idea to remove original tokens in the first example, so I will leave this branch for the moment.
If we were to make our own `glob2regex`, I want to support `[]` operator . `[!]` is very useful to reduce false matches. https://en.wikipedia.org/wiki/Glob_(programming)#Unix
I think that stemming is a mechanical dimension reduction method like feature hashing, so we cannot really expect its output to be "natural" and readable. However, removal of umlant induces...
Then, `dictionary_select()`? or just add the arguments to `dictionary(..., keys = NULL, levels = 1:99)`?
I am fine with `x[key, recur = TRUE]` which applies `[key]` to all the levels. `[, levels = 1:99"]` seems too complex.
No, logical or numeric does not work when nested levels have different number of keys. Also I want to only apply `x[]` to nested levels, so `recur` is not very...
The root cause is that `Sys.glob()` does not tell us what in file paths "*" matched. https://github.com/quanteda/readtext/blob/555aa7222c255a0cde3e17e983dede0e240857f5/R/utils.R#L164
Good point. We probably should have both "North Macedonia" and "Macedonia" because many contemporary documents still use the old name.
@danimadrid great job for Spanish!
Hi @sneetsher There is no one working on Arabic. That would be awesome!