solo-learn Question about --scale

Question about --scale_loss

Open yutaro-s opened this issue 1 year ago • 3 comments

Hi Could you teach me the effect of --scale_loss? Does the option normalize the gradient of batch normalization and bias parameters if I use it with --exclude_bias_n_norm, as explained in the original Barlow Twins repository? Thank you.

Aug 10 '22 03:08 yutaro-s

Hi, --scale_loss was a parameter that barlow had in their original version. It is not mentioned in the paper, but you can check the commit that removed it in their official repo some time ago https://github.com/facebookresearch/barlowtwins/commit/046eec3c7f5c098b42cdf43e04df332957637d6a. It basically produces the same results, but they just wanted to remove this extra hyperparameter. We opted to keep it to be consistent with the checkpoints we have for the other datasets.

--exclude_bias_n_norm exclude those parameters from LARS, as first described in BYOL's paper.

Aug 10 '22 07:08 vturrisi

Thanks for your quick reply. I undersntad the --scale_loss and --exclude_grad_bias_norm options

Let me confirm the following points:

Solo-learn' Barlow Twins implementation is based on the original (previous) implementation in the official repo.
In solo-learn's Barlow Twins implementation, we can implicitly use a learning rate for biases and batch norm parameters by using the --scale_loss option as in the previous official implementation.

Aug 10 '22 09:08 yutaro-s

Exactly, our implementation should match the original implementation from barlow.

Aug 10 '22 12:08 vturrisi

I got it! Thank you.

Aug 10 '22 14:08 yutaro-s

solo-learn solo-learn copied to clipboard

Question about --scale_loss

solo-learn
solo-learn copied to clipboard