sha-rnn SplitcrossEntropy

SplitcrossEntropy

Open gslaller opened this issue 5 years ago • 1 comments

Can you provide any further information on the loss function you are using? Perhaps a reference to a paper?

Jan 03 '20 16:01 gslaller

@gslaller - Seems to be from here: Efficient softmax approximation for GPUs

See: https://twitter.com/Smerity/status/1343159498081366017

The SHA-RNN paper itself only uses it as it was already part of AWD-LSTM. It's the adaptive softmax from linked FAIR paper. Almost all Facebook (FAIR) codebases use it. Essentially a computationally efficient hierarchical softmax. Hope that helps! https://arxiv.org/abs/1609.04309

Dec 27 '20 13:12 munael

sha-rnn sha-rnn copied to clipboard

SplitcrossEntropy

sha-rnn
sha-rnn copied to clipboard