jetson-inference
jetson-inference copied to clipboard
Validation Loss: nan, Validation Regression Loss nan, Validation Classification Loss: nan
I trained model and ıt worked fine in Collecting your own Detection Datasets with follow this road. But I wanted to improve my model therefore I added photos .I followed same way with repo. But I tried again to train my model but the train returned nan.
Why it didnt calculate value ?
python3 train_ssd.py --dataset-type=voc --data=data/products --model-dir=models/products5
2020-11-25 08:10:24 - Epoch: 8, Step: 40/44, Avg Loss: nan, Avg Regression Loss nan, Avg Classification Loss: nan
@dusty-nv
Hi @dusty-nv , i have litteraly the same issue. All my losses during training are nan. I have a multi-label dataset, with each image containing several objects. All images appear to be correctly annotated (pascalVOC).
initial command : python3 train_ssd.py --data=data/mildiou_detection --model-dir=models/mildiou --batch-size=32 --epochs=30 --dataset-type=voc --validation-mean-ap="True"
command for debug : python3 train_ssd.py --data=data/mildiou_detection --model-dir=models/mildiou --batch-size=1 --epochs=30 --dataset-type=voc --validation-mean-ap="True" --debug-steps=1 --workers=0 --log-level="debug"
I've already trained models on other datasets without any problems. Do you have any idea what's causing this problem ?
@nodk15 try removing/fixing the image annotations from the image/xml directly proceeding the first nan
@dusty-nv i try removing with no success, still the same issue... but the images are correctly annotated. Maybe because of the image size (5184x3888) ?
here is the xml file of the first image :
`
<annotation>JPEGImages</folder>
<folder>JPEGImages</folder>
<filename>Pinot noir - Corin 2021-08-16-18.11.16.jpg</filename>
<path>../JPEGImages/Pinot noir - Corin 2021-08-16-18.11.16.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>5184</width>
<height>3888</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>downy mildew oil spot square</name>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>1597</xmin>
<ymin>1429</ymin>
<xmax>2267</xmax>
<ymax>2019</ymax>
</bndbox>
</object>
<object>
<name>downy mildew oil spot square</name>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>1757</xmin>
<ymin>2029</ymin>
<xmax>2207</xmax>
<ymax>2599</ymax>
</bndbox>
</object>
<object>
<name>downy mildew oil spot square</name>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>2307</xmin>
<ymin>1649</ymin>
<xmax>3007</xmax>
<ymax>2409</ymax>
</bndbox>
</object>
<object>
<name>downy mildew oil spot square</name>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>2297</xmin>
<ymin>2389</ymin>
<xmax>2947</xmax>
<ymax>2769</ymax>
</bndbox>
</object>
<object>
<name>downy mildew oil spot square</name>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>3237</xmin>
<ymin>2189</ymin>
<xmax>3497</xmax>
<ymax>2519</ymax>
</bndbox>
</object>
<object>
<name>downy mildew oil spot square</name>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>2437</xmin>
<ymin>849</ymin>
<xmax>3187</xmax>
<ymax>1579</ymax>
</bndbox>
</object>
<object>
<name>downy mildew oil spot square</name>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>3387</xmin>
<ymin>1659</ymin>
<xmax>3867</xmax>
<ymax>2349</ymax>
</bndbox>
</object>
</annotation>
`
@nodk15 did you remove it's image ID from your dataset's ImageLists ? (i.e. is it no longer being loaded)
If so, it might be another image that is causing the NaN's. In that case, you need to keep filtering/removing them until the NaN's are gone or you find out what about them is causing the NaN's.
@dusty-nv Is there a limit to the number of objects per image ? Practically all images generate nan value, and I didn't notice something different between images with a problem and those without.
Finally, I solved my problem. The problem was effectively in the annotations. Many of my annotations had an xmax lower than the xmin... Thanks you @dusty-nv for help :)