CLIMS icon indicating copy to clipboard operation
CLIMS copied to clipboard

How to extract background image features

Open cjf-repo opened this issue 2 years ago • 4 comments
trafficstars

When calculating the cosine similarity between the background and text, only the features of the background are extracted? and How to delete the features of the foreground objects? I try to make the foreground object black in the image, and keep the background ,but sometimes CLIP still recognizes that object and make a high scores. So I do not know how did you extract image features from the background of the image. image

cjf-repo avatar May 14 '23 12:05 cjf-repo

Hi, during training, we use a soft activation map (1-pk) to mask out background regions, i.e., (1-pk) * x. With the L_BTM loss, pk will be optimized to only activate the background regions.

Sierkinhane avatar May 14 '23 14:05 Sierkinhane

Hi, during training, we use a soft activation map (1-pk) to mask out background regions, i.e., (1-pk) * x. With the L_BTM loss, pk will be optimized to only activate the background regions.

Ok,thank you! The specific approach is to generate the initial CAMs p, and then use this (1-p) to multiply with image x to mask out the foreground object. From the perspective of the image matrix, this make the pixel value of the foreground object smaller and the pixel value of the background larger to mask out the foreground object, right? Is there any deficiency in my understanding of this?

cjf-repo avatar May 14 '23 16:05 cjf-repo

Exactly. p should be normalized into [0,1] by sigmoid. Btw, welcome to star CLIMS. :)

Sierkinhane avatar May 15 '23 01:05 Sierkinhane

Exactly. p should be normalized into [0,1] by sigmoid. Btw, welcome to star CLIMS. :)

Thank you! I have starred CLIMS.

cjf-repo avatar May 15 '23 01:05 cjf-repo