ContrastiveSeg icon indicating copy to clipboard operation
ContrastiveSeg copied to clipboard

Running the t code shows loss=nan when calculating the coco dataset.

Open liaochuanlin opened this issue 1 year ago • 4 comments

Running the t code shows loss=nan when calculating the coco dataset.

liaochuanlin avatar Aug 04 '23 09:08 liaochuanlin

I encountered the same problem with the coco_stuff dataset.

2023-08-07 16:07:13,007 INFO [trainer.py, 229] Train Epoch: 0 Train Iteration: 30 Time 4.248s / 10iters, (0.425) Forward Time 2.557s / 10iters, (0.256) Backward Time 1.583s / 10iters, (0.158) Loss Time 0.075s / 10iters, (0.007) Data load 0.033s / 10iters, (0.003317) Learning rate = [0.0009995649894856365, 0.009995649894856365, 0.009995649894856365] Loss = nan (ave = nan)

Is it because there aren't enough training epochs?

sanitizer84 avatar Aug 07 '23 09:08 sanitizer84

I encountered the same error, have you solved this problem?

SchuckLee avatar Mar 26 '24 14:03 SchuckLee

I encountered the same problem when I changed the architecture to my own model. In my case, I found some elements of exp_logits + neg_logits could be zeros, thus resulting in inf after a log function.

After making a small change from https://github.com/tfzhou/ContrastiveSeg/blob/287e5d3069ce6d7a1517ddf98e004c00f23f8f99/lib/loss/loss_contrast.py#L121 to

log_prob = logits - torch.log(exp_logits + neg_logits + 1e-10)

Everything was going well.

kevinkevin556 avatar Apr 29 '24 08:04 kevinkevin556

I encountered the same problem when I changed the architecture to my own model. In my case, I found some elements of exp_logits + neg_logits could be zeros, thus resulting in inf after a log function.

After making a small change from

https://github.com/tfzhou/ContrastiveSeg/blob/287e5d3069ce6d7a1517ddf98e004c00f23f8f99/lib/loss/loss_contrast.py#L121

to

log_prob = logits - torch.log(exp_logits + neg_logits + 1e-10)

Everything was going well.

That makes sense for me, thanks for your help!!

SchuckLee avatar May 03 '24 08:05 SchuckLee