Yassine
Yassine
I think you're right, the relative embedding are added to the keys and not the values. As detailed in eq. 3 of the paper. But given that we unfold the...
I have a similar question about the loss calculation, we have: ```python et = torch.mean(torch.exp(M(z_bar, x_tilde))) M.ma_et += ma_rate * (et.detach().item() - M.ma_et) mutual_information = torch.mean(M(z, x_tilde)) - torch.log(et) *...
This is a second and more efficient way of computing the neg / positives, check the appendix A2, they go through both ways, with a discriminator and with a non-linear...
Thanks @qianlanwyd
Thank you very much !
Hi, I agree with the above. About MoCo trick, I am curious on how it is implemented, in the original paper, the elements of momentum dict are only used as...
@HobbitLong Can you please provide some additional info on how we compute the loss in case of segmentation & depth ? I can't see how to compute the contrastive loss...
I had the same question, as I understand, ContrastLoss(out_v2) will not have any gradients given that the teacher is not being trained.
Yes you are right, thanks.
I tried writing a cython wrapper, but no luck, I found some difficulties when compiling using distutils.