Kenneth Benoit comments

Results 258 comments of


                                            Kenneth Benoit

topfeature function return weird result

Well, I was trying to guess what you might have done, but without your code, it's only a guess. If you can include that code, then we can identify the...

topfeature function return weird result

We still can't reproduce your problem from this, as you have not included the data objects. But it sounds like anchor_babies is not included in your `top_filter`. Note that the...

Allow reshaping in tokens?

I like this idea in general. This could be very useful for many reasons, including collocation boundary detection. We already do this in `textstat_collocations()` (don't span punctuation) but keeping track...

Allow reshaping in tokens?

What about making a new subclass of tokens object, call it "tokens3" for v3, that is not a list of tokens but a list of a list of tokens, where:...

Allow reshaping in tokens?

We could easily implement it so that it would not, since we'd unlist level three in `as.tokens()` so that it became regular tokens for the vast majority of the tokens...

Allow reshaping in tokens?

But in your implementation above, you have a stri_split_boundaries call, then lapply a paste, and then a tokens_segment. Unnesting one part of a list to get the equivalent tokens object...

Allow reshaping in tokens?

That sounds good. I'd prefer though to have a slightly different workflow wherein 1. allows us to choose the sentence segmenter, and does not require insertion of markers. This allows...

Allow reshaping in tokens?

Very interesting. This is definitely a promising and worthwhile direction for more work. Our existing fcm construction does not span sentences or elements that have been removed, when computing co-occurrence...

POS feature selection

We need something faster such as a connection to the Stanford NLP or a C backend.

POS feature selection

Mostly moved to the [**spacyr**](https://github.com/kbenoit/spacyr) package, but will still need a method of selecting features or tokens based POS tags, which implies a method for recording tags as part of...