awd-lstm-lm
awd-lstm-lm copied to clipboard
model.decoder is never used?
I started to suspect something is wrong when generate.py
script crarshed. Then I was surprised to see that line output, hidden = model(input, hidden)
yileds an output variable of with hidden size of last recurrent layer, not the size of vocabulary. So I took further look into the model.py
and was surprised to see that self.decoder
is not used at all!
If I understood something wrong, correct me, but now it seems that it should not work at all (at least if tie_weights
is not used)
The decoder is used in the loss function criterion
, which takes the decoder weights as input. It seems that the decoder doesn't receive gradients / get updated (except through weight tie-ing)...
My question was more about sampling actually, as I noticed it on generate.py
script, training was okay (except for mysterious crashes, I succeeded)
I agree with @mojesty - without the decoder step in generate.py
it currently only samples from the first 400 (depending on embedding layer params, assuming tied weights) tokens in the corpus. Should be fixed by replacing output
with model.decoder(output)
in line 70.