Kohei Watanabe

Results 164 comments of Kohei Watanabe

I have no good idea to remove original tokens in the first example, so I will leave this branch for the moment.

If we were to make our own `glob2regex`, I want to support `[]` operator . `[!]` is very useful to reduce false matches. https://en.wikipedia.org/wiki/Glob_(programming)#Unix

I think that stemming is a mechanical dimension reduction method like feature hashing, so we cannot really expect its output to be "natural" and readable. However, removal of umlant induces...

Then, `dictionary_select()`? or just add the arguments to `dictionary(..., keys = NULL, levels = 1:99)`?

I am fine with `x[key, recur = TRUE]` which applies `[key]` to all the levels. `[, levels = 1:99"]` seems too complex.

No, logical or numeric does not work when nested levels have different number of keys. Also I want to only apply `x[]` to nested levels, so `recur` is not very...

The root cause is that `Sys.glob()` does not tell us what in file paths "*" matched. https://github.com/quanteda/readtext/blob/555aa7222c255a0cde3e17e983dede0e240857f5/R/utils.R#L164

Good point. We probably should have both "North Macedonia" and "Macedonia" because many contemporary documents still use the old name.

@danimadrid great job for Spanish!

Hi @sneetsher There is no one working on Arabic. That would be awesome!