openspeech
openspeech copied to clipboard
Potential Issue with EncoderDecoder Model with CrossEntropy Loss
Hi,
I see that the LSTM Attention Decoder takes the log_softmax of the step outputs inside the model. However, cross entropy loss uses nn.CrossEntropyLoss, which takes another log_softmax inside. Shouldn't nn.NLLLoss
be used instead of nn.CrossEntropyLoss
? This would cause issues in models such as ConformerLSTMModel
.
As far as I know, nn.CrossEntropy measure whether log_softmax is applied already and decide whether to apply or not.
Also, because log_softmax(log_softmax(x)) = log_softmax(x)
, the result is the same.