ast icon indicating copy to clipboard operation
ast copied to clipboard

some question about Deit's two [cls] token processing.

Open liyunlongaaa opened this issue 1 year ago • 2 comments

Hi, sorry to bother you. Why are the two special [CLS]tokens in DeiT said to be average as a single [CLS] token in the paper, but in the code I see that they are indeed cat together, what am I missing?

cls_tokens = self.v.cls_token.expand(B, -1, -1) 
dist_token = self.v.dist_token.expand(B, -1, -1)
x = torch.cat((cls_tokens, dist_token, x), dim=1)

liyunlongaaa avatar Jul 27 '22 12:07 liyunlongaaa

oh, I see it.

x = (x[:, 0] + x[:, 1]) / 2 sorry to bother you. thank you for your good work, I am newer for my master's degree in the speech area, and I want to graduate but have to post a dissertation, thank you for helping me along the way, although I haven't issued a dissertation yet haha~

liyunlongaaa avatar Jul 27 '22 13:07 liyunlongaaa

To use DEIT initialization, we have to initialize in the same way as DEIT, but as you point out, we average it in the forward pass.

Good luck with your dissertation.

-Yuan

YuanGongND avatar Jul 28 '22 03:07 YuanGongND