CLIMS
CLIMS copied to clipboard
How to extract background image features
When calculating the cosine similarity between the background and text, only the features of the background are extracted? and How to delete the features of the foreground objects? I try to make the foreground object black in the image, and keep the background ,but sometimes CLIP still recognizes that object and make a high scores. So I do not know how did you extract image features from the background of the image.
Hi, during training, we use a soft activation map (1-pk) to mask out background regions, i.e., (1-pk) * x. With the L_BTM loss, pk will be optimized to only activate the background regions.
Hi, during training, we use a soft activation map (1-pk) to mask out background regions, i.e., (1-pk) * x. With the L_BTM loss, pk will be optimized to only activate the background regions.
Ok,thank you! The specific approach is to generate the initial CAMs p, and then use this (1-p) to multiply with image x to mask out the foreground object. From the perspective of the image matrix, this make the pixel value of the foreground object smaller and the pixel value of the background larger to mask out the foreground object, right? Is there any deficiency in my understanding of this?
Exactly. p should be normalized into [0,1] by sigmoid. Btw, welcome to star CLIMS. :)
Exactly. p should be normalized into [0,1] by sigmoid. Btw, welcome to star CLIMS. :)
Thank you! I have starred CLIMS.