sherlock-project icon indicating copy to clipboard operation
sherlock-project copied to clipboard

gensim version

Open engineersunny opened this issue 2 years ago • 4 comments

Hi, I'm using gensim 3.8.0 but still gets this error when I run extract_features().

'Doc2Vec' object has no attribute 'neg_labels'

Is there a way to avoid this?

engineersunny avatar Mar 04 '22 06:03 engineersunny

That's strange.

I found this quote this on an SO post:

that error resembles a very-old bug which only showed up if Gensim was not fully installed to have the necessary Cython-optimized routines for fast training/inference operations. (That caused some older, seldom-run code to be run that had a dependency on the missing neg_labels. Newer versions of Gensim have eliminated that slow code-path entirely.)

Would you mind bumping the dependency version to gensim 3.8.3 and trying again? According to the release notes, there's a particular fix, "Fix missing C extensions", which might help given the above quote.

lowecg avatar Mar 05 '22 08:03 lowecg

I just tested gensim 3.8.3

I had to edit sherlock/features/paragraph_vectors.py and comment out the line:

assert gensim.models.doc2vec.FAST_VERSION > -1, "This will be painfully slow otherwise"

I ran the notebook 01-data-preprocessing.ipynb from a fresh clone of the repo. It ran cleanly and there was no observable impact on the performance despite the scary message from the assertion we just disabled.

lowecg avatar Mar 05 '22 09:03 lowecg

Thanks for reporting this issue @engineersunny, and testing and sharing your solution @lowecg. @engineersunny did this solution work for you?

@lowecg I did not encounter this issue myself, do you know if it this is a general issue (e.g. it happens in the extract_features_to_csv() function as well)? Do you think we should comment this line by default?

madelonhulsebos avatar Mar 10 '22 17:03 madelonhulsebos

@madelonhulsebos I got that assertion error each time when I ran the code as well so I commented out earlier but still getting the error message above. It might be a gensim version issue as I couldn't install the old version on my machine.

engineersunny avatar Mar 14 '22 01:03 engineersunny