LEI LU
LEI LU
I got your idea Boss. But referring to what you have mentioned: "mask out the image tokens that are largely affected by the pixel masking (any token that is overlapped...
I got you. So is that the probably the reason why you choose to use mask shapes like a square or something that is unified not random mask so that...
Thanks for the detailed explanation! I am quite more confident on understanding this part then. Appreciate your time and patience!
Any update on this? I can hardly think of a solution to build a relationship between direct mask on pixel level and prediction of the codebook index domain.