byol-pytorch
byol-pytorch copied to clipboard
Why the loss is different from BYOL authors'
I found the loss is different from the loss said in BYOL paper which should be a L2 loss and I did't find explanation... The loss in this repo is a cosine loss, and I just want to know why. BTW, thanks for this great repo!
if you read section J.3 in the paper, the code is identical
if you read section J.3 in the paper, the code is identical
Thanks for your reply! I see, after normalization the L2 loss is the same as cosine similarity