Deformation-Segmentation icon indicating copy to clipboard operation
Deformation-Segmentation copied to clipboard

nan Losses and 100 accuracy

Open xsk07 opened this issue 2 years ago • 1 comments

Hello,

I was trying to train the model but I'm encountering some issues with Accuracy and losses values. Currently I'm using the DeepGlobe dataset where pixels of masks are encoded using 0/255 encoding i.e., using the following mapping: 'unknown': [0, 0, 0], 'urban': [0, 255, 255], 'agriculture': [255, 255, 0], 'rangeland': [255, 0, 255], 'forest ': [0, 255, 0], 'water': [0, 0, 255], 'barren': [255, 255, 255]. Using this encoding somehow it works but after few steps the two losses becomes nan and in the second epoch Accuracy reaches 100, here the output:

Epoch: [1][0/114], Time: 8.72, Data: 3.96, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 7.50, Seg_Loss: 2.249039, Edge_Loss: 0.152145
Epoch: [1][20/114], Time: 2.30, Data: 0.20, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 45.84, Seg_Loss: 2.592526, Edge_Loss: 0.772081
Epoch: [1][40/114], Time: 2.11, Data: 0.11, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 49.47, Seg_Loss: nan, Edge_Loss: nan
Epoch: [1][60/114], Time: 2.04, Data: 0.08, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 65.99, Seg_Loss: nan, Edge_Loss: nan
Epoch: [1][80/114], Time: 2.01, Data: 0.06, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 74.39, Seg_Loss: nan, Edge_Loss: nan
Epoch: [1][100/114], Time: 1.99, Data: 0.05, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 79.46, Seg_Loss: nan, Edge_Loss: nan
Saving checkpoints...
Saving history...
Epoch: [2][0/114], Time: 1.90, Data: 0.01, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 100.00, Seg_Loss: nan, Edge_Loss: nan
Epoch: [2][20/114], Time: 1.94, Data: 0.01, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 100.00, Seg_Loss: nan, Edge_Loss: nan
Epoch: [2][40/114], Time: 1.92, Data: 0.01, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 100.00, Seg_Loss: nan, Edge_Loss: nan

I imagine that the encoding is wrong but cannot understand what's the correct one. Furthermore the input images I pass in input to the model are 0:255, but your normalisation is performed on a [0:1] range. Should we normalise images to [0:1] before inputting them to the net? Or is it done inside? Could this be the cause of the issue?

xsk07 avatar Dec 14 '22 11:12 xsk07

#16 Fixes failing validation step, but the NANs in Seg_Loss and Edge_Loss are still there!

cnstt avatar Dec 14 '22 17:12 cnstt

Hi @xsk07 label should be 8bit png images with integers pixel values ranging from 0-6 corresponding to the six default classes described in DeepGlobe. Then l.168-181 in dataset.py will handle the encoding.

Normalization is handled by img_transform.

lxasqjc avatar Dec 14 '22 23:12 lxasqjc

Yeah, labels in our case are exaclty as you decribed but still not working. We saw this issue on #2 so maybe we can continue this discussion there.

xsk07 avatar Dec 15 '22 11:12 xsk07