Olivier Delmarcelle
Olivier Delmarcelle
A very related issue: [mschubert/clustermq/#47](https://github.com/mschubert/clustermq/issues/47)
My idea was to use this result as a basis for computing PMI or NPMI values for word pairs. It requires the joint and marginal probability of words to compute...
I don't think it is related to the setting of `order`, which simply split the total number of co-occurrence between the upper and lower triangle of the matrix. It is...
Thanks for your answers. @koheiw this is indeed something I thought of, but I realized that this approach is fundamentally different because it computes the probability of co-occurrence happening in...
You're right, I'm not sure I'm going in the right direction either. This other example challenges my approach... Using window of 2, from the point of view of "b", "a"...
I took some time to read few papers using PMI or NPMI in the hope of coming with a nice solution. ["Word Association Norms, Mutual Information, and Lexicography"](https://www.aclweb.org/anthology/J90-1003.pdf) mention the...
I updated the target branch of this pull request, you should be able to merge.
The issue is not about a wrong order - the re-ordering of sento_corpus is correct. The danger comes from the fact that the initial corpus is un-ordered, and so is...
I think a printed message, especially when sento_corpus() re-order the corpus could help. Alternatively, `tokens` could expect a named list where names represent the texts' ID.
Just came across this issue. It's niche but having an argument allowing to pass a function handling the serialization of complex types would be helpful. In my case, I want...