Kenneth Benoit
Kenneth Benoit
Well, I was trying to guess what you might have done, but without your code, it's only a guess. If you can include that code, then we can identify the...
We still can't reproduce your problem from this, as you have not included the data objects. But it sounds like anchor_babies is not included in your `top_filter`. Note that the...
I like this idea in general. This could be very useful for many reasons, including collocation boundary detection. We already do this in `textstat_collocations()` (don't span punctuation) but keeping track...
What about making a new subclass of tokens object, call it "tokens3" for v3, that is not a list of tokens but a list of a list of tokens, where:...
We could easily implement it so that it would not, since we'd unlist level three in `as.tokens()` so that it became regular tokens for the vast majority of the tokens...
But in your implementation above, you have a stri_split_boundaries call, then lapply a paste, and then a tokens_segment. Unnesting one part of a list to get the equivalent tokens object...
That sounds good. I'd prefer though to have a slightly different workflow wherein 1. allows us to choose the sentence segmenter, and does not require insertion of markers. This allows...
Very interesting. This is definitely a promising and worthwhile direction for more work. Our existing fcm construction does not span sentences or elements that have been removed, when computing co-occurrence...
We need something faster such as a connection to the Stanford NLP or a C backend.
Mostly moved to the [**spacyr**](https://github.com/kbenoit/spacyr) package, but will still need a method of selecting features or tokens based POS tags, which implies a method for recording tags as part of...