MaskCLIP icon indicating copy to clipboard operation
MaskCLIP copied to clipboard

the question in this paper

Open ShunZuo-AI opened this issue 2 weeks ago • 0 comments

Hello author, may I ask why you want to elaborate on this statement in your paper? Why does the model need to use a class token instead of the average token and add x to the output in order for the model to work with the VIT backbone?

Image

ShunZuo-AI avatar Feb 07 '25 08:02 ShunZuo-AI