Kohei Watanabe comments

Results 164 comments of


                                            Kohei Watanabe

Simplify summary()

Something like this? When a corpus is reshaped `docid_` shows the number of sentences in original documents. ```r require(quanteda) corp

Simplify summary()

I think output for `docid_`, `segid_` and `length_` are not had, although it is redundant for `document_`.

I don't have strong opinion on this issue, but need to close for modularization. Only for the sake of discussion: ``` > summary(corpus_reshape(data_corpus_inaugural)) Corpus consisting of 5018 documents document segment...

Return docnames as row.names or a column

How about this? ```r > docvars(dfm(c("a", "b")) docname_ docid_ segid_ 1 text1 text1 1 2 text2 text2 1 ``` This is the same as ```r > quanteda:::get_docvars.dfm(dfm(c("a", "b")), user =...

Make ngram wrapper for patterns?

I like the mask idea, but we need to generalize it a bit more to allow selection of ngrams and collocations with different length. I will also think about it.

Integrating fcm value weighting

I wrote a small function to compute PMI using FCM while ago. Do you want to add something like this? ```r > toks fcmt > fcm_pmi

Integrating fcm value weighting

Why don't you start a branch to add a new function called `fcm_weight()` with additional measures? I am happy to assist.

Integrating fcm value weighting

I wrote `fcm_pmi()` for pre-processing for SVD, so I though should be in the main package. If it is for network analysis, textstats would be a better place. @eisioriginal how...

docid returns a factor not a character

Good eyes. `docid` is factor. We hugely welcome users' participation via pull requests!

Create a function to compound tokens with skips

It would be more useful and easy to implement `skip` for `window`, which will be like ``` tokens_compound(toks, "not", window = 2, skip = 1) #> [1] "London_not" "not_bad" "a"...