reformer-pytorch icon indicating copy to clipboard operation
reformer-pytorch copied to clipboard

Class token implementation

Open karttikeya opened this issue 3 years ago • 1 comments

Hi,

I was wondering how the class token is supposed to be handled in the reversible design? Since, replicating the token across the two residual paths is perhaps not optimal.

Any thoughts/pointers to code is appreciated.

karttikeya avatar May 03 '21 07:05 karttikeya

@karttikeya for the Reformer, I'd follow what this paper has done https://arxiv.org/abs/2103.17239 and only have the CLS token cross attend to the full sequence for about ~2-3 rounds at the end, as means of attention pooling

lucidrains avatar May 09 '21 14:05 lucidrains