volo
volo copied to clipboard
What's the difference between `x_cls` and `x_aux`?
Congratulations for the SOTA!
x_cls
and x_aux
seems like NLP concepts.
How should I understand them when using VOLO as a face recognition network?
Which is the face representation feature vector
?
Hi, x_cls is class token, x_aux are the output tokens of other patches(or feature tokens).