Radim Řehůřek comments

Results 314 comments of


                                            Radim Řehůřek

Updated RandomState (deprecated from numpy) to default_rng (Generator)

Thanks for investigating! TBH I'm not a fan of all the if-then logic. It will be really hard to maintain. When does numpy actually drop the deprecated `RandomState`? We should...

Testing with code coverage enabled causes tests to hang

Yeah, weird. Could you please re-trigger the PRs that previously failed because of coverage?

Exploding Perplexity for big number of topics

@snollygoster123123 I remember other users reporting similar issues, since we switched the LDA default precision from double (float64) to single (float32) in #1656. Can you try this https://github.com/RaRe-Technologies/gensim/issues/217#issuecomment-435539481 and let...

Exploding Perplexity for big number of topics

Ping @snollygoster123123 @menshikh-iv are you able to provide a reproducible example? We'll have a look.

potential 'alias method' negative-sampling optimization from 'Koan' paper

Gensim code in question: https://github.com/RaRe-Technologies/gensim/blob/7e898f492ddd784962c58395a358998ee7ffb831/gensim/models/word2vec_inner.pyx#L217-L243 This optimization seems fairly deep down the (C) stack; @gojomo what is your intuition re. its impact on end-to-end performance?

potential 'alias method' negative-sampling optimization from 'Koan' paper

OK, makes sense. Are you able to do that quick sanity check, estimating an upper bound on the achievable speed-up?

LdaModel constructor is missing a dimensionality check

That doesn't seem to be the right place for an input format check. If your corpus is malformed, then even if you skip the "force-term-ids-to-int" code path, the code will...

[early WIP] Fix/rationalize loss-tallying

+1 on matching FB's logic. What is "trial-count"? Is the average taken over words or something else?

[early WIP] Fix/rationalize loss-tallying

@gojomo cleaning up the loss-tallying logic still very much welcome. Did you figure out the "increasing loss" mystery? We're planning to make a Gensim release soon – whether this PR...

Fix word2vec doc-comment example of KeyedVectors usage, issue 2669

@bvinesh we're planning a new release of Gensim soon. We could include this PR if you finish it – please let us know (or close it).