dse icon indicating copy to clipboard operation
dse copied to clipboard

How hard negative contrastive loss work?

Open voorhs opened this issue 1 year ago • 0 comments

I am trying to understand this code snippet:

negimp = neg.log().exp()
Ng = (negimp*neg).sum(dim = -1) / negimp.mean(dim = -1)
loss_pos = (-posmask * torch.log(pos / (Ng+pos))).sum() / posmask.sum()

As it seems to me, this code implements not the same function as was introduced in paper. It implements this (using paper's notation):

\ell^{i,i^+}=-\log\frac{\exp(\text{sim}(e_i,e_{i^+})/\tau)}{\displaystyle\exp(\text{sim}(e_i,e_{i^+})/\tau)+\sum_{j=1,j\neq i,j\neq i^+}^{2M}\alpha_{ij}\cdot\exp(\text{sim}(e_i,e_j)/\tau)},\quad \alpha_{ij}=\frac{\exp(\text{sim}(e_i,e_j)/\tau)}{\displaystyle\frac{1}{2M-2}\sum_{k=1,k\neq i,k\neq i^+}^{2M}\exp(\text{sim}(e_i,e_k)/\tau)}

where $e_1,\ldots,e_M$ and $e_{M+1},\ldots,e_{2M}$ are the context and response encodings respectively ($i^+:=i+M$). But according to paper, it doesn't seem to match.

Am I misinterpreting paper or/and code?

voorhs avatar Sep 17 '23 12:09 voorhs