gensim
gensim copied to clipboard
Not possible to continue training with hs=1...
Hi, i came now multiple times across this error... Now I want to post it.
It seems not possible to continue training if hs=1.
File "C:\Users\\Anaconda3\envs\tf2\lib\site-packages\gensim\models\fasttext.py", line 617, in build_vocab
keep_raw_vocab=keep_raw_vocab, trim_rule=trim_rule, **kwargs)
File "C:\Users\\Anaconda3\envs\tf2\lib\site-packages\gensim\models\base_any2vec.py", line 929, in build_vocab
self.trainables.prepare_weights(self.hs, self.negative, self.wv, update=update, vocabulary=self.vocabulary)
File "C:\Users\\Anaconda3\envs\tf2\lib\site-packages\gensim\models\fasttext.py", line 1021, in prepare_weights
super(FastTextTrainables, self).prepare_weights(hs, negative, wv, update=update, vocabulary=vocabulary)
File "C:\Users\\Anaconda3\envs\tf2\lib\site-packages\gensim\models\word2vec.py", line 1689, in prepare_weights
self.update_weights(hs, negative, wv)
File "C:\Users\\Anaconda3\envs\tf2\lib\site-packages\gensim\models\word2vec.py", line 1734, in update_weights
self.syn1 = vstack([self.syn1, zeros((gained_vocab, self.layer1_size), dtype=REAL)])
AttributeError: 'FastTextTrainables' object has no attribute 'syn1'
example
model=FastText(sg=1,hs=1,min_n=5,max_n=5,workers=4,ns_exponent=0.75,iter=5,alpha=0.025,window=5,size=300,negative=10,min_count=1)
model.build_vocab(sentences=sentences)
total_examples = model.corpus_count
model.train(sentences=sentences, total_examples=total_examples, epochs=model.epochs)
model.build_vocab(sentnews, update=True)
total_examples = model.corpus_count
model.train(sentences=sentnews, total_examples=total_examples, epochs=5)
You probably don't want to have both hs=1
and negative=10
– both modes enabled – at once. (Typically either one mode or the other dominates for a certain corpus/goal, with negative
tending to perform better on larger corpuses.) Does using hs=1, negative=0
trigger the error?
Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the build_vocab(..., update=True)
is likely to work better [with negative-sampling than HS mode].
Still, I wouldn't expect this error. What gensim version are you in, and can you make a completely self-contained test, with a tiny amount of dummy data, that reproduces the same error?
You probably don't want to have both
hs=1
andnegative=10
– both modes enabled – at once.
Why not? Isnt it hs with negative sampling?
Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the
build_vocab(..., update=True)
is likely to work better.
This I do not understand. What do you mean? Isn't it what I try to do? It is the structure for continue training from your docs.
You probably don't want to have both
hs=1
andnegative=10
– both modes enabled – at once.Why not? Isnt it hs with negative sampling?
No; the hs
('hierarchical softmax') and negative
('negative sampling') options are distinct methods, and the hs
mode has no use for a negative
parameter. If both are non-zero, two separate output networks are assembled, sharing the same input vectors, each being trained on (& being a source of backpropagated adjustments from) each text.
Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the
build_vocab(..., update=True)
is likely to work better.This I do not understand. What do you mean? Isn't it what I try to do? It is the structure for continue training from your docs.
Sorry, that should have read "likely to work better with negative-sampling." Yes, the library allows vocab-expansion with HS mode, but it might not be a good idea compared to alternatives (like many other allowed operations/parameters).
@gojomo I get the error reported by @datistiquo in this minimal Colab notebook, where I'm trying to retrain a pretrained model...
Looking at that notebook, I see no such error - training appears to occur without error.
(Separately: it'd be good to correct the deprecation-warning that notebook is getting, as it suggests the exact new method that should be called instead. And, I personally would not assume such incremental-training of another model with a relatively-small amount of new data is a good idea. Any words/fragments well-represented in the new data might move arbitrarily-far to new coordinates, checked only by the offsetting influence of other words/fragments in the new data. A potentially much-larger number of words/fragments in the older data will stay put, and thus gradually lose meaningful comparability with your moved vectors. Unless doing some before-and-after quality checks on the whole model, over a larger range of probes than just your new data, there's no telling how large such an effect could be, or whether starting from someone else's model is a benefit or hindrance to your ultimate goals.)
Ah sorry, I ended up correcting the syntax in the notebook!
The data here is just a toy example to show the syntax to a student. That said, I appreciate your words of wisdom on the deprecation warning @gojomo!
You probably don't want to have both
hs=1
andnegative=10
– both modes enabled – at once.Why not? Isnt it hs with negative sampling?
No; the
hs
('hierarchical softmax') andnegative
('negative sampling') options are distinct methods, and thehs
mode has no use for anegative
parameter. If both are non-zero, two separate output networks are assembled, sharing the same input vectors, each being trained on (& being a source of backpropagated adjustments from) each text.Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the
build_vocab(..., update=True)
is likely to work better.This I do not understand. What do you mean? Isn't it what I try to do? It is the structure for continue training from your docs.
Sorry, that should have read "likely to work better with negative-sampling." Yes, the library allows vocab-expansion with HS mode, but it might not be a good idea compared to alternatives (like many other allowed operations/parameters).
Thanks for your explaination, I indeed feel confused when I first noticed both two keywords can be set with non-zero value. Maybe some notes on this can be added to docs to inform users that they actually use seperate steps instead of training a single model with both two methods combined.