marton-avrios
marton-avrios
I implemented 2D relative attention in T5 as a bias term that is added before softmax. I would like to do the same for ReformerLM and later also to an...
You should also specify `total_train_steps` here: https://github.com/tensorflow/mesh/blob/d91460615e32cf13077f94a868a8324f63fe758e/mesh_tensorflow/transformer/utils.py#L672-L676
I am currently try to implement returning logits along with prediction from `sample_autoregressive` to calculate score from them. However the score calculated from these logits are slightly different from the...