Radim Řehůřek
Radim Řehůřek
Thanks for investigating! TBH I'm not a fan of all the if-then logic. It will be really hard to maintain. When does numpy actually drop the deprecated `RandomState`? We should...
Yeah, weird. Could you please re-trigger the PRs that previously failed because of coverage?
@snollygoster123123 I remember other users reporting similar issues, since we switched the LDA default precision from double (float64) to single (float32) in #1656. Can you try this https://github.com/RaRe-Technologies/gensim/issues/217#issuecomment-435539481 and let...
Ping @snollygoster123123 @menshikh-iv are you able to provide a reproducible example? We'll have a look.
Gensim code in question: https://github.com/RaRe-Technologies/gensim/blob/7e898f492ddd784962c58395a358998ee7ffb831/gensim/models/word2vec_inner.pyx#L217-L243 This optimization seems fairly deep down the (C) stack; @gojomo what is your intuition re. its impact on end-to-end performance?
OK, makes sense. Are you able to do that quick sanity check, estimating an upper bound on the achievable speed-up?
That doesn't seem to be the right place for an input format check. If your corpus is malformed, then even if you skip the "force-term-ids-to-int" code path, the code will...
+1 on matching FB's logic. What is "trial-count"? Is the average taken over words or something else?
@gojomo cleaning up the loss-tallying logic still very much welcome. Did you figure out the "increasing loss" mystery? We're planning to make a Gensim release soon – whether this PR...
@bvinesh we're planning a new release of Gensim soon. We could include this PR if you finish it – please let us know (or close it).