VLCounter
VLCounter copied to clipboard
About the strategy of CLIP
https://github.com/Seunggu0305/VLCounter/blob/2dc15ddd218744c2c3c63b667fa0bc4a24ce8c3c/tools/models/ViT_Encoder_add.py#L122-L128 I noticed that the maskCLIP strategy was implemented in the code and the MLP of CLIP layers was removed. Could you provide the results without this strategy? Additionally, would CLIP-surgery lead to improved performance?
We observed a performance degradation from using the CLIP-surgery fashion compared to maskCLIP, even with the use of the CLIP. We will provide you with the results soon.