lightly icon indicating copy to clipboard operation
lightly copied to clipboard

Add configurable normalization epsilon for NTXentLoss

Open mieszkokl opened this issue 1 year ago • 1 comments

When training with half-precision I noticed that normalization in NTXentLoss can give NaN values.

in forward method, there is a code:

        # normalize the output to length 1
        out0 = nn.functional.normalize(out0, dim=1)
        out1 = nn.functional.normalize(out1, dim=1)

It uses torch.nn.functional.normalize function with default 1e-12 epsilon, what gives 0 for half precision. As a result we have division by zero and NaN in output.

The way to solve it is to add optional normalization epsilon parameter in NTXentLoss initializer and use it when calling torch.nn.functional.normalize function.

Please let me know if there is any mistake in my understanding. If it's okay for you, I can propose a pull request.

mieszkokl avatar May 29 '23 09:05 mieszkokl

You are right, 1e-12 is too small for torch.HalfTensor:

>>> torch.HalfTensor([1e-12])
tensor([0.], dtype=torch.float16)

However, the only way to run into this problem is if the tensors out0 and out1 have a norm smaller than 1e-12 which is a very unlikely scenario and could hint at a possible bug in your code. That being said I think your fix is reasonable and we'd welcome your pull request.

philippmwirth avatar May 30 '23 06:05 philippmwirth