transformer-xl icon indicating copy to clipboard operation
transformer-xl copied to clipboard

parameter cutoff in the function single_core_graph

Open zhuolinumd opened this issue 5 years ago • 3 comments

@kimiyoung Can you explain the meaning and usage of the parameter cutoffs in the function single_core_graph? Can you provide some examples? Thanks

zhuolinumd avatar Jul 09 '19 19:07 zhuolinumd

Hi, any answer by the authors is going to be more accurate, but since I have looked up Adaptive Softmax I can comment on this and possibly help.

cutoffs is used to partition the vocabulary into groups, depending on word frequency. E.g.: [0, 20000, 40000, 200000, 267735] means that Group_1 contains the 20,000 most common words, whereas Group_4 contains the 267735-200000=67,735 most rare words.

Why these groups? To speed up the computation of softmax-like probabilities, at the cost of a minor drop in accuracy. As you know, softmax is computed on the logits as: softmax(x_i) = e^{x_i} / \sum_{forall j != i}{ e^{x_j} } Instead of computing the softmax formula on each element, Adaptive softmax assigns a part of the probability distribution (e.g. 0.3) to a group. Each word of the group will later be assigned its own probability, taking up a portion of the probability mass of its group (e.g. 0.3)

AndreaLK3 avatar Jul 10 '19 14:07 AndreaLK3

Thank you @AndreaLK3

zhuolinumd avatar Jul 10 '19 14:07 zhuolinumd

Thank you @AndreaLK3

LeeSureman avatar Oct 07 '19 11:10 LeeSureman