marton-avrios
Results
2
comments of
marton-avrios
For the basic relative attention scenario I add 1 of N (for T5 N=32) different, learned scalars to each query-key dot product based on the relative distance of the corresponding...
I am currently working on a `return_logits` option for `sample_autoregressive` that just returns the already available logits together with `outputs` so no extra computation is involved. If this is set...