MaskCLIP the question in this paper

the question in this paper

Open ShunZuo-AI opened this issue 2 weeks ago • 0 comments

Hello author, may I ask why you want to elaborate on this statement in your paper? Why does the model need to use a class token instead of the average token and add x to the output in order for the model to work with the VIT backbone?

Feb 07 '25 08:02 ShunZuo-AI

MaskCLIP MaskCLIP copied to clipboard

the question in this paper

MaskCLIP
MaskCLIP copied to clipboard