Multi-Label Image-Text Contrastive Loss

Open pzhren opened this issue 3 years ago • 1 comments

Hi！Very good work. I have some questions. Why not consider aligning the 8 segment tokens with the generated text? would this be better

Sep 05 '22 04:09 pzhren

Hi @pzhren ,

Truly sorry for the late reply.

Since we don't use the ground truth mask, it's difficult to define the correct match, but we did tried some matching between text and segment tokens, which doesn't lead to the improvement.

Sep 30 '22 04:09 xvjiarui