dse
dse copied to clipboard
How hard negative contrastive loss work?
I am trying to understand this code snippet:
negimp = neg.log().exp()
Ng = (negimp*neg).sum(dim = -1) / negimp.mean(dim = -1)
loss_pos = (-posmask * torch.log(pos / (Ng+pos))).sum() / posmask.sum()
As it seems to me, this code implements not the same function as was introduced in paper. It implements this (using paper's notation):
\ell^{i,i^+}=-\log\frac{\exp(\text{sim}(e_i,e_{i^+})/\tau)}{\displaystyle\exp(\text{sim}(e_i,e_{i^+})/\tau)+\sum_{j=1,j\neq i,j\neq i^+}^{2M}\alpha_{ij}\cdot\exp(\text{sim}(e_i,e_j)/\tau)},\quad \alpha_{ij}=\frac{\exp(\text{sim}(e_i,e_j)/\tau)}{\displaystyle\frac{1}{2M-2}\sum_{k=1,k\neq i,k\neq i^+}^{2M}\exp(\text{sim}(e_i,e_k)/\tau)}
where $e_1,\ldots,e_M$ and $e_{M+1},\ldots,e_{2M}$ are the context and response encodings respectively ($i^+:=i+M$). But according to paper, it doesn't seem to match.
Am I misinterpreting paper or/and code?