IS-BERT why use MI instead of InfoNCE as loss function?

why use MI instead of InfoNCE as loss function?

Open ShellingFord221 opened this issue 3 years ago • 1 comments

Hi, since you treat each sentence and its local context representations as positive examples, and treat all the local context representations from other sentences as negative examples, like what we usually do in contrastive learning, why do you choose MI as loss function instead of conventional CL loss like InfoNCE? Is MI better than InfoNCE in this scenario? Thanks!

Mar 29 '22 14:03 ShellingFord221

Hi Thanks for the interest. Actually, both InfoNCE and JSD can be used for MI estimation. I just found that JSD works better when I was doing this work.

May 15 '22 13:05 yanzhangnlp

IS-BERT IS-BERT copied to clipboard

why use MI instead of InfoNCE as loss function?

IS-BERT
IS-BERT copied to clipboard