awd-lstm-lm model.decoder is never used?

model.decoder is never used?

Open mojesty opened this issue 6 years ago • 3 comments

I started to suspect something is wrong when generate.py script crarshed. Then I was surprised to see that line output, hidden = model(input, hidden) yileds an output variable of with hidden size of last recurrent layer, not the size of vocabulary. So I took further look into the model.py and was surprised to see that self.decoder is not used at all! If I understood something wrong, correct me, but now it seems that it should not work at all (at least if tie_weights is not used)

May 21 '18 20:05 mojesty

The decoder is used in the loss function criterion, which takes the decoder weights as input. It seems that the decoder doesn't receive gradients / get updated (except through weight tie-ing)...

Jun 06 '18 15:06 kellywzhang

My question was more about sampling actually, as I noticed it on generate.py script, training was okay (except for mysterious crashes, I succeeded)

Jun 09 '18 17:06 mojesty

I agree with @mojesty - without the decoder step in generate.py it currently only samples from the first 400 (depending on embedding layer params, assuming tied weights) tokens in the corpus. Should be fixed by replacing output with model.decoder(output) in line 70.

Aug 28 '18 15:08 andrewPoulton

awd-lstm-lm awd-lstm-lm copied to clipboard

model.decoder is never used?

awd-lstm-lm
awd-lstm-lm copied to clipboard