About the strategy of CLIP

Open nanfangAlan opened this issue 1 year ago • 1 comments

https://github.com/Seunggu0305/VLCounter/blob/2dc15ddd218744c2c3c63b667fa0bc4a24ce8c3c/tools/models/ViT_Encoder_add.py#L122-L128 I noticed that the maskCLIP strategy was implemented in the code and the MLP of CLIP layers was removed. Could you provide the results without this strategy? Additionally, would CLIP-surgery lead to improved performance?

Mar 05 '24 12:03 nanfangAlan

We observed a performance degradation from using the CLIP-surgery fashion compared to maskCLIP, even with the use of the CLIP. We will provide you with the results soon.

Mar 15 '24 00:03 Seunggu0305