Ivan Menshikh

Results 52 comments of Ivan Menshikh

Thanks for the report @init-random, your problem reproduced with the latest version of gensim.

Just for note: I also received very large perplexity value with `gensim==3.7.1` (even bigger than @snollygoster123123) with training on super-large corpus (13.5kk documents, 850k dictionary, 0.018% density), **but**: - I...

@piskvorky no I can't (by NDA reasons), sorry. I guess you can try to reproduce that with any large corpus (similar by stats from the previous message)

@gojomo I received one more report with this problem, maybe raise an exception for this case (when `update=True`), because this happens often and often (until we repair the bug itself...

@khulasaandh as I know, you can `infer_vector` for new document & calculate needed similarity values.

@khulasaandh looks really suspicious (your code is correct). Can you share data (traned model & token_list) for reproducing this error?

Big thanks @khulasaandh, reproduced with `Python 2.7.14 (default, Sep 23 2017, 22:06:14) [GCC 7.2.0] on linux2` Segfault moment ```python In [6]: model.train(sentences_2, total_examples=model.corpus_count, epochs=model.iter) /home/ivan/.virtualenvs/math/bin/ipython:1: DeprecationWarning: Call to deprecated `iter`...

@Diego999 so, probably your Dictionary is very large, can you reduce the size of the dictionary and try again (based on log messages, it is still being initialized)

I partially agree with @gojomo, this is really security risk (exactly by reason that "GitHub release" have no guarantees to immutability, this guarantee exist only by our agreement). I'm +1...

hi @psorianom, can you please add tests for this function (at least a case that you described in issue)?