gensim
gensim copied to clipboard
AttributeError: 'Doc2Vec' object has no attribute 'dv’
Problem description
A simple call to model.docvecs.most_similar results in an error. Seems to have forgotten about 'dv' as model.dv.most_similar results in a similar error:
DeprecationWarning: Call to deprecated `docvecs` (The `docvecs` property has been renamed `dv`.).
model.docvecs.most_similar(app, topn=topn)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-051320dcdf92> in <module>
2 topn = 10
3
----> 4 model.docvecs.most_similar(app, topn=topn)
~/.pyenv/versions/3.8.6/lib/python3.8/site-packages/gensim/utils.py in new_func1(*args, **kwargs)
1517 stacklevel=2
1518 )
-> 1519 return func(*args, **kwargs)
1520
1521 return new_func1
~/.pyenv/versions/3.8.6/lib/python3.8/site-packages/gensim/models/doc2vec.py in docvecs(self)
316 @deprecated("The `docvecs` property has been renamed `dv`.")
317 def docvecs(self):
--> 318 return self.dv
319
320 @docvecs.setter
AttributeError: 'Doc2Vec' object has no attribute 'dv’
Versions
Linux-5.4.0-66-generic-x86_64-with-glibc2.29 Python 3.8.6 (default, Dec 23 2020, 13:54:27) [GCC 9.3.0] Bits 64 NumPy 1.20.1 SciPy 1.6.0 gensim 4.0.1 FAST_VERSION 1
Minimal reproducible example please.
import pickle
with open("path_to_the_model", 'rb') as f: # the saved model that has been trained on version 3.8.3 (I can't provide - NDA)
model = pickle.load(f)
model.docvecs.most_similar(id) # or model.dv.most_similar(id)
OK. Loading from 3.8.3 should work, that's the last release before 4.0.0.
@gojomo can you think of what's wrong here?
While it was proposed to have specific prior-version models as part of the test suite, for each prior-version intended as 'supported-to-load', it doesn't look like that happened.
Look in the test_data
or test_data/old_d2v_models
- there's nothing past v 3.4.0. Also, any test methods to test such older-version models were disabled, crudely, with the expectation they'd be replaced with a smaller set of methods/models testing carefully chosen later versions - but that clearly didn't happen.
If we'd had any "load-3.8.3-Doc2Vec-model-&-do-some-checks-on-it" test, it would have turned up this problem, and perhaps others.
As a temporary workaround, it might be enough to, after loading, manually patch dv
to exist with the right deserialized atrribute (as a proper Doc2Vec
backward-compatibility routine, probably in a _load_specials()
method, will eventually have to do):
model.dv = model.__dict__['docvecs']
But, this also might just reveal other unhandled upconversion issues.
After model.dv = model.__dict__['docvecs']
:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/.pyenv/versions/3.8.6/lib/python3.8/site-packages/gensim/models/keyedvectors.py in most_similar(self, positive, negative, topn, clip_start, clip_end, restrict_vocab, indexer)
733 negative = []
734
--> 735 self.fill_norms()
736 clip_end = clip_end or len(self.vectors)
737
~/.pyenv/versions/3.8.6/lib/python3.8/site-packages/gensim/models/keyedvectors.py in fill_norms(self, force)
616
617 """
--> 618 if self.norms is None or force:
619 self.norms = np.linalg.norm(self.vectors, axis=1)
620
AttributeError: 'KeyedVectors' object has no attribute 'norms'
Never mind tests; I can see no migration code in Doc2Vec.load()
at all. Am I missing something?
Of course having automated tests for loading older versions would be great, but it looks like nobody is up for implementing that any time soon.
But breaking prior-version compatibility, with or without tests, is against our own policy. Such code should never have been merged (and yes, tests would have discovered the problem instantly, but even without tests, such attribute migration code should have been there, like all the other prior migrations).
Anyway, what now:
- Check which models are affected (doc2vec only? others too?).
- Add attribute migration to
load
, in the standard way. Probably fairly trivial. - Ideally, add model-loading tests (much more work than 2).
- Make a bugfix release.
Not sure who'll get to this or when, but the bug sounds pretty critical. I marked this ticket as "impact HIGH, reach MEDIUM".
Not sure who'll get to this or when, but the bug sounds pretty critical. I marked this ticket as "impact HIGH, reach MEDIUM".
Until the bug is resolved, should we downgrade to 3.8.3 if we need to use doc2vec now?
I'd prefer you add the attribute migration to load
(open a PR), so we can make a bugfix release.
I'd prefer you add the attribute migration to
load
(open a PR), so we can make a bugfix release.
I don't feel sufficiently familiar with the code base to make any changes (today is my first day using gensim at all), I'm sorry!
OK. Loading from 3.8.3 should work, that's the last release before 4.0.0.
Update: I have 3.8.3 installed, not 4.0.0, and am still having this issue. How far back does it go?
EDIT: I had installed with conda, which is why I had 3.8.3. I went and installed with pip instead (per the advice in #2826), and once I had 4.0.1, the issue was resolved. Not sure if someone has addressed this in a different issue/PR, but this seems to have fixed it for me!
FYI: Loading from 3.8.3 does not work using gensim 4.1.2.
Regarding the potential workaround I'd mentioned earlier:
After
model.dv = model.__dict__['docvecs']
:AttributeError: 'KeyedVectors' object has no attribute 'norms'
It might be sufficient to add model.dv.norms = None
to work around this subsequent error.
Hi,
for me I used model.docvec and it worked.
I ran into this problem trying to open a 3.8.3 model in 4.0.x (trying to update some old code to run on a modern version, hoping to rely on the one-version-back backwards compatibility to avoid retraining). I succeeded after making a local fork of 4.0.x and patching some some things. I am actually up for that "load-3.8.3-Doc2Vec-model-&-do-some-checks-on-it" test; are you still open to a PR on 4.0.x?
No, we don't accept PRs on older versions. Can you make one based of the develop head?
I can, sure. I was offering 4.0.x because the project promises one-version-back compatibility, and this would support 3.8->4.0 compat, but if you'd rather have develop, I'll target that.
@piskvorky What are your thoughts? My opinion is that 4.0 is done and dusted, and any fixes should go to the newest version instead.
We want the patch to go the the latest (develop
), yes. Thank you!
Any news on this issue? It's been quiet for awhile but I still think this is one that is affecting a lot of people.
@braxvan Can you say more about the cases where you or other are seeing this? Specifically:
- from what version were the models saved?
- what's your code for triggering the error?
- does the suggested patchup from above resolve the issue?
Specifically, immediately after you Doc2Vec.load(older_model_path)
, if you run...
model.dv = model.__dict__['docvecs']
model.dv.norms = None
..are there no further problems? (If so, we might just add those steps automatically as a fix for a future Gensim release.)
Hello, yes, that information is below:
-
The models were originally saved from Gensim 3.8.0. We did try to load and then save those models again with version 3.8.3 to upgrade them to be compatible with Gensim 4.x, however based on comments further up in this thread, it would appear that functionality does not work in version 3.8.3.
-
Code wise, the only thing I am doing is loading the model to check if it can load in Gensim 4.3.2:
import os
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
# error about dv attribute immediately triggered on this line
model = Doc2Vec.load("./my-model.model")
- I tried implementing the hot-fix like below with Gensim 3.8.3 when saving the model and then loading that model with 4.x. That gets us by the original "self.dv" error but now we are seeing an error related to KeyedVectors that we are troubleshooting
import os
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
# Using gensim 3.8.3
model = Doc2Vec.load("./my-model.model")
model.dv = model.__dict__['docvecs']
model.dv.norms = None
model.save("my_new_model.model")
- Here is the exact error I am seeing:
Traceback (most recent call last):
gensim.models.doc2vec.Doc2Vec.load('tm-requirements-new.model')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/models/doc2vec.py", line 801, in load
raise ae
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/models/doc2vec.py", line 795, in load
return super(Doc2Vec, cls).load(*args, rethrow=True, **kwargs)
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/models/word2vec.py", line 1929, in load
raise ae
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/models/word2vec.py", line 1922, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/utils.py", line 487, in load
obj._load_specials(fname, mmap, compress, subname)
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/models/word2vec.py", line 1938, in _load_specials
super(Word2Vec, self)._load_specials(*args, **kwargs)
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/utils.py", line 518, in _load_specials
getattr(self, attrib)._load_specials(cfname, mmap, compress, subname)
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/models/keyedvectors.py", line 244, in _load_specials
self._upconvert_old_d2vkv()
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/models/keyedvectors.py", line 1700, in _upconvert_old_d2vkv
self.vocab = self.doctags
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/models/keyedvectors.py", line 654, in vocab
self.vocab() # trigger above NotImplementedError
File "/home/<user_name_removed>/gt3/lib64/python3.6/site-packages/gensim/models/keyedvectors.py", line 646, in vocab
"The vocab attribute was removed from KeyedVector in Gensim 4.0.0.\n"
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4 ```
I didn't end up having time to work on this, but here are some things I figured out along the way, hopefully of use to someone else:
- In
gensim/models/keyedvectors.py
, in_load_specials
, the "ensure at least an empty 'expandos'" if clause must happen before_upconvert_old_d2vkv
. (I suspect this is the issue with the traceback above.) - At the end of this function,
self.__dict__.pop('vocab', None)
to address the possibility that_upconvert_old_vocab()
has not run.
Note that the original report here was that an older Doc2Vec
model loaded successfully into Gensim 4+, but then failed on a later .most_similar()
operation. Thus anyone subsequently reporting a failure on initial .load()
is encountering a slightly-different problem – quite possibly related, but perhaps requiring a different fix or workaround.
So: be sure to be very clear the exact origin of any model hitting a problem, which code hits the problem, and any error messages/tracebacks encountered.
Hello, yes, that information is below:
- The models were originally saved from Gensim 3.8.0. We did try to load and then save those models again with version 3.8.3 to upgrade them to be compatible with Gensim 4.x, however based on comments further up in this thread, it would appear that functionality does not work in version 3.8.3.
Unfortunately, the above (December 15, 2021) report that "Loading from 3.8.3 does not work using gensim 4.1.2" isn't clear whether it is a failure-on-load, or per original report, a broken model after load succeeds. So for your case, @braxvan, it would still be worthwhile to check whether:
- your model loads into 3.8.3, and can be re-saved from there to a new filename
- that re-saved model loads into the latest Gensim, or not. And if it loads, if it fails in the same way on the 1st attempted
.most_similar()
. And if it fails-on-load, whether the error/stack is the exact same as your original (pre-3.8.3) model.
- Code wise, the only thing I am doing is loading the model to check if it can load in Gensim 4.3.2:
import os from gensim.models.doc2vec import Doc2Vec, TaggedDocument # error about dv attribute immediately triggered on this line model = Doc2Vec.load("./my-model.model")
It's be useful to see a full error stack for this failure on .load(original_model)
– not previously reported. (Perhaps it's the same as the failure on your attempted-workaround .load('tm-requirements-new.model')
– but even if so, that'd be good to fully confirm.)
- I tried implementing the hot-fix like below with Gensim 3.8.3 when saving the model and then loading that model with 4.x. That gets us by the original "self.dv" error but now we are seeing an error related to KeyedVectors that we are troubleshooting
import os from gensim.models.doc2vec import Doc2Vec, TaggedDocument # Using gensim 3.8.3 model = Doc2Vec.load("./my-model.model") model.dv = model.__dict__['docvecs'] model.dv.norms = None model.save("my_new_model.model")
Note that this workaround was intended for the case where the model loaded without error, but then failed on a later .most_similar()
. It wasn't intended as a procedure for fixing-up the model in an older (pre-4.0.0) Gensim. So not surprised it's failing as a pre-save fixup.
Hi! I was trying to make Gensim3.8.3 work with Python3.11. I face the error of missing longinterp.h. RedHat Linux is the OS. Thank you for your help in advance.
@rpolava That sounds unrelated to this issue, so you should ask about your problem, with full details of what you're attempting & what errors/blocks/failures you're seeing, in a more appropriate place, like the project discussion forum or StackOverflow (putting the gensim
tag on your question). Not here, on an unrelated bug report.
As models developed with Gensim 3.8.3 could not be loaded with Gensim 4.*, I tried to run those models with Gensim 3.8.3 in Python 3.11. The suggested solution of "In gensim/models/keyedvectors.py, in _load_specials, the "ensure at least an empty 'expandos'" if clause must happen before_upconvert_old_d2vkv" resulted in the following error: gensim/models/word2vec_inner.c:216:12: fatal error: longintrepr.h: No such file or directory #include "longintrepr.h" ^~~~~~~~~~~~~~~ compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1
@rpolava It is still unclear to me what edits you attempted, in order to follow someone else's (not mine, but @thatandromeda's) suggestions. From the earlier reports, I don't see how anything related to expandos
would affect the missing dv
upconversion, though I suppose there may be some indirect effects, for example via some un-noticed error happening earlier preventing intended operations.
It's also unclear to me what later code or operation of yours is then triggering this error. (Is it a project-compile or pip install
that's triggering some compilation?) So still: the relation to the original issue is unclear to me. Notably, in a normal local install, no edits to pure-Python like keyedvectors.py
should trigger any recompilations & thus compilation errors. (So: if you're getting gcc
errors here, it's possible they're unrelated to your bug-specific edits, and instead inherent to trying a build, for the 1st time, in an environment that lacks some necessary pieces – and you'd get the same errors trying any fresh Gensim build in your environment - again making the error non-germaine to the .dv
issue.)
So even if you are on a quest to resolve the "has no attribute 'dv'" issue, you're going to have to be a lot more clear about what you'd trying – maybe via a PR showing your code changes? – and why, and what's being done to trigger your new-and-different compilation problem, for me to have any chance of understanding and helping.
Further, I wouldn't try any fixes via editing Gensim code until after confirming that similar patchups, performed manually/explicitly outside Gensim code, work to fix the issue. Has that been confirmed? How?
Unfortunately, the above (December 15, 2021) report that "Loading from 3.8.3 does not work using gensim 4.1.2" isn't clear whether it is a failure-on-load, or per original report, a broken model after load succeeds. So for your case, @braxvan, it would still be worthwhile to check whether:
- your model loads into 3.8.3, and can be re-saved from there to a new filename
- We did verify this. We loaded our 3.8.0 model into Gensim 3.8.3 and then saved it as "my-model-new.model" and it saved without issue.
- that re-saved model loads into the latest Gensim, or not. And if it loads, if it fails in the same way on the 1st attempted
.most_similar()
. And if it fails-on-load, whether the error/stack is the exact same as your original (pre-3.8.3) model.It's be useful to see a full error stack for this failure on
.load(original_model)
– not previously reported. (Perhaps it's the same as the failure on your attempted-workaround.load('tm-requirements-new.model')
– but even if so, that'd be good to fully confirm.)
-
The resaved model does not load into the latest version of Gensim with Python 3.11.
-
Here is the stack trace for the original Gensim 3.8.0 model when we attempt to load it in Gensim 4.3.2:
Traceback (most recent call last):
File "C:\Users\<username_removed>\Desktop\test_model\test.py", line 5, in <module>
model = Doc2Vec.load("./<model_name>.model")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\doc2vec.py", line 816, in load
raise ae
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\doc2vec.py", line 809, in load
model = super(Doc2Vec, cls).load(*args, rethrow=True, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\word2vec.py", line 1960, in load
raise ae
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\word2vec.py", line 1953, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\utils.py", line 487, in load
obj._load_specials(fname, mmap, compress, subname)
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\word2vec.py", line 1969, in _load_specials
super(Word2Vec, self)._load_specials(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\utils.py", line 518, in _load_specials
getattr(self, attrib)._load_specials(cfname, mmap, compress, subname)
^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\utils.py", line 1522, in new_func1
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\doc2vec.py", line 328, in docvecs
return self.dv
^^^^^^^
AttributeError: 'Doc2Vec' object has no attribute 'dv'. Did you mean: 'dm'?
- And here is the stack trace for the Gensim 3.8.0 model that was re-saved with Gensim 3.8.3 when we attempt to load in Gensim 4.3.2:
Traceback (most recent call last):
File "C:\Users\<username_removed>\Desktop\test_model\test.py", line 5, in <module>
model = Doc2Vec.load("./<model_name_new>.model")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\doc2vec.py", line 817, in load
raise ae
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\doc2vec.py", line 810, in load
model = super(Doc2Vec, cls).load(*args, rethrow=True, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\word2vec.py", line 1960, in load
raise ae
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\word2vec.py", line 1953, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\utils.py", line 487, in load
obj._load_specials(fname, mmap, compress, subname)
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\word2vec.py", line 1969, in _load_specials
super(Word2Vec, self)._load_specials(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\utils.py", line 518, in _load_specials
getattr(self, attrib)._load_specials(cfname, mmap, compress, subname)
^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\utils.py", line 1522, in new_func1
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\Anaconda3\envs\gensim4\Lib\site-packages\gensim\models\doc2vec.py", line 329, in docvecs
return self.dv
^^^^^^^
AttributeError: 'Doc2Vec' object has no attribute 'dv'. Did you mean: 'dm'?
Note that this workaround was intended for the case where the model loaded without error, but then failed on a later
.most_similar()
. It wasn't intended as a procedure for fixing-up the model in an older (pre-4.0.0) Gensim. So not surprised it's failing as a pre-save fixup.
Thanks for all the help so far, we have several large models that we would really prefer not to train again if possible!