SupContrast icon indicating copy to clipboard operation
SupContrast copied to clipboard

Why the func returns "nan"?

Open Zhudongsheng75 opened this issue 2 years ago • 8 comments

I want to use this loss func, but it returns 'nan' for me when I run the test code. Who can tell me why? Thanks!

截屏2022-03-18 下午12 50 33 截屏2022-03-18 下午12 50 39

Zhudongsheng75 avatar Mar 18 '22 04:03 Zhudongsheng75

Please check the update losses.py in Pull requests.

evechny131 avatar Mar 31 '22 05:03 evechny131

I am also facing the same issue.

The suggested fix also doesn't help:

https://colab.research.google.com/drive/14IJ_xrfOexa7X_uM7dURT-itoVJoLjo2?usp=sharing

ksivajana avatar Mar 31 '22 10:03 ksivajana

I also have met this problem. In my opinion, the line of calculating log_prob is not robust enough. There is very little probability that it will calculate log(0), which actually produces nan in the loss. https://github.com/HobbitLong/SupContrast/blob/a8a275b3a8b9b9bdc9c527f199d5b9be58148543/losses.py#L89 It is better to rewritten as: log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True) + 1e-6)

yuanlonghui avatar May 09 '22 13:05 yuanlonghui

I also have met this problem. I use SupCon loss and ResNet12 as encoder. I set temperature=0.07 and print out logits at "/SupContrast/blob/master/losses.py" line 74 When features were not nomalized, logits elements are close to -9000,and at "/SupContrast/blob/master/losses.py" line 88, exp_logits elements are zeros. When features were nomalized, logits elements are close to -10,and at "/SupContrast/blob/master/losses.py" line 88, exp_logits elements are close to e-07. Maybe temperature>1 and normalize(features) can help.

Jf-Chen avatar Jun 16 '22 02:06 Jf-Chen

I have also faced this issue. Apart from the solutions mentioned above I would suggest the following changes:

In line 92: mean_log_prob_pos = (mask * log_prob).sum(1) / mask.sum(1)

It is possible for all the elements in one row of mask to be zero which causes a "division by zero" resulting in nan loss values. I personally did the following modification which worked for me: mean_log_prob_pos = (mask*log_prob).sum(1)/(mask.sum(1)+1e-6)

Also adding that the input features need to be normalized as pointed out earlier in the comments.

Shubhammawa avatar Jul 14 '22 11:07 Shubhammawa

Hi! @Jf-Chen, just as a clarification, when you say normalize the feature with normalize(features) you mean along the feature dimension right (so here 128), not batch normalize?

piconti avatar Nov 03 '22 10:11 piconti

Hi! @Jf-Chen, just as a clarification, when you say normalize the feature with normalize(features) you mean along the feature dimension right (so here 128), not batch normalize?

I know little about methods of normalization, but my purpose of normalization is clear, that is "limiting the value of feature and avoid large negative value, for example, -9000".

For better understanding, take a feature [128,640] as example, it is [batch_size,dim], the output of projector in SupCon. logits_unnorm = self.proj(features_all) # logits_unnorm is [128,640] logits = F.normalize(logits_unnorm, dim=1) In the document of pytorch, it says "dim" means which should be reduced and the index start from zero, so reduce 640 should use "dim=1".

I am not sure my understanding is right or not, but it works well, no bug, satisfied accuracy.

Is it called batch normalize? I am not sure about it, there are too many formulas on wiki. :eyes:

Jf-Chen avatar Nov 03 '22 14:11 Jf-Chen

I want to use this loss func, but it returns 'nan' for me when I run the test code. Who can tell me why? Thanks!

截屏2022-03-18 下午12 50 33 截屏2022-03-18 下午12 50 39

add F.normalize(features, dim=1). And it works.

HYC01 avatar Jan 21 '24 07:01 HYC01