TianxingHe
Results
2
comments of
TianxingHe
You can use my code, for the projected softmax: ``` if compute_full_outp == True: out_full_logps = [head_logprob[:, :self.cutoffs[0]]] offset = 0 cutoff_values = [0] + self.cutoffs for i in range(1,...
Thanks for the reply! I now understand that transformer-xl doesn't need to recompute things. Are you implying that for the vanilla transformer-lm, that's large overlap between mini-batches (so that each...