marton-avrios

Results 2 comments of marton-avrios

For the basic relative attention scenario I add 1 of N (for T5 N=32) different, learned scalars to each query-key dot product based on the relative distance of the corresponding...

I am currently working on a `return_logits` option for `sample_autoregressive` that just returns the already available logits together with `outputs` so no extra computation is involved. If this is set...