ContrastiveSeg
ContrastiveSeg copied to clipboard
Running the t code shows loss=nan when calculating the coco dataset.
Running the t code shows loss=nan when calculating the coco dataset.
I encountered the same problem with the coco_stuff dataset.
2023-08-07 16:07:13,007 INFO [trainer.py, 229] Train Epoch: 0 Train Iteration: 30 Time 4.248s / 10iters, (0.425) Forward Time 2.557s / 10iters, (0.256) Backward Time 1.583s / 10iters, (0.158) Loss Time 0.075s / 10iters, (0.007) Data load 0.033s / 10iters, (0.003317) Learning rate = [0.0009995649894856365, 0.009995649894856365, 0.009995649894856365] Loss = nan (ave = nan)
Is it because there aren't enough training epochs?
I encountered the same error, have you solved this problem?
I encountered the same problem when I changed the architecture to my own model. In my case, I found some elements of exp_logits + neg_logits
could be zeros, thus resulting in inf
after a log function.
After making a small change from https://github.com/tfzhou/ContrastiveSeg/blob/287e5d3069ce6d7a1517ddf98e004c00f23f8f99/lib/loss/loss_contrast.py#L121 to
log_prob = logits - torch.log(exp_logits + neg_logits + 1e-10)
Everything was going well.
I encountered the same problem when I changed the architecture to my own model. In my case, I found some elements of
exp_logits + neg_logits
could be zeros, thus resulting ininf
after a log function.After making a small change from
https://github.com/tfzhou/ContrastiveSeg/blob/287e5d3069ce6d7a1517ddf98e004c00f23f8f99/lib/loss/loss_contrast.py#L121
to
log_prob = logits - torch.log(exp_logits + neg_logits + 1e-10)
Everything was going well.
That makes sense for me, thanks for your help!!