SimCLR About gradient accumulation

About gradient accumulation

Open wujunjie1998 opened this issue 3 years ago • 2 comments

Hi:

Thanks for your implementation. I just have a question regarding to the gradient accumulation part of NT-Xent loss. Though we divide the loss by num_accumulation_steps at each mini_batch, the following equation: loss = torch.mean(-torch.log(sim_match / (torch.sum(sim_mat, dim=-1) - norm_sum))) will still let the loss not comparable, since "torch.sum(sim_mat, dim=-1) - norm_sum)" is performing on a matrix with shape [2Batch_size, 2Batch_size]. For example, when we are running with "Batch_size 256, accumulation step 1" and "Batch_size 64, accumulation step 4", their loss values are not similar.

Any comments about this?

Thanks

Mar 04 '21 03:03 wujunjie1998

IMHO, this loss is not additive by design (like f1 score)

Jun 03 '21 16:06 AlekseySh

So, features should be collected over the virtual batches and then loss should be applied

Jun 03 '21 16:06 AlekseySh

SimCLR SimCLR copied to clipboard

About gradient accumulation

SimCLR
SimCLR copied to clipboard