vit-pytorch cls

cls_token

Open shibin2018 opened this issue 4 years ago • 2 comments

All samples in a batch share the same cls_token(because in the code, the cls_token is repeated for batch_size), but how they change to be different during loss backward? As the cls_token was used as the classifier input, then all samples in a batch will be classified as the same label?

Dec 03 '20 09:12 shibin2018

the CLS token is passed through the layers of attention and aggregates information from the rest of the tokens as it makes its way up

Dec 03 '20 19:12 lucidrains

I had the same question here and here is my illustration about it. Please remind yourself that cls_token is a parameter, not a feature of input. Actually we can consider it as the starting point to give final label through self-att & mlp as info-aggregating procedures. By comparing the y_hat=f(cls_token, params|input) and true label y, cls_token and other params would be updated to be able to learn the effective way for aggregating infos from input.

Feb 20 '23 12:02 Linhengyang

vit-pytorch vit-pytorch copied to clipboard

cls_token

vit-pytorch
vit-pytorch copied to clipboard