Radim Řehůřek
Radim Řehůřek
IMO `Phrases` need a proper rewrite (both for performance and for cleaning up the logic, the API). Also related to the Bounter optimizations.
Where does Gensim multiply more than 2 matrices (so that multidot has any effect)? Like @gojomo, a practical demonstration of the benefits would be nice.
Sorry, without a clear demonstration of its benefit, I'm -1 on introducing changes. If there's a compelling use-case, such as a speed-up when training a new model, or calculating document...
Pretty nice! I'll look into this after the 4.2 release.
The `*MatrixSimilarity` stuff is the oldest part of Gensim, along with the LsiModel. It dates back to DML-CZ days, in ancient pre-history :) (definitely pre-github) To me it makes perfect...
Isn't there a way to factor out such "constant" contributions? Btw the sparse implementation in scipy is pretty "dumb" (in the sense of unoptimized). So writing a specialized for-loop or...
Okay, up to you. I'm not familiar with the motivation BM25+ and BM25L etc, can't judge how relevant / in-demand by users they are.
@Witiko busy ATM, sorry. I plan to get to this in June – is there anything urgent riding on the merge from your side? Some deadlines?
Code looks nice and clean, sorry for taking so long to review. @mpenkov anything else we need before merge? @Witiko how about post-merge? What can we do to promote this...