GroupViT icon indicating copy to clipboard operation
GroupViT copied to clipboard

Multi-Label Image-Text Contrastive Loss

Open pzhren opened this issue 3 years ago • 1 comments

Hi!Very good work. I have some questions. Why not consider aligning the 8 segment tokens with the generated text? would this be better

pzhren avatar Sep 05 '22 04:09 pzhren

Hi @pzhren ,

Truly sorry for the late reply.

Since we don't use the ground truth mask, it's difficult to define the correct match, but we did tried some matching between text and segment tokens, which doesn't lead to the improvement.

xvjiarui avatar Sep 30 '22 04:09 xvjiarui