jetson-inference icon indicating copy to clipboard operation
jetson-inference copied to clipboard

Validation Loss: nan, Validation Regression Loss nan, Validation Classification Loss: nan

Open AhmetEnesYalcinkaya opened this issue 4 years ago • 6 comments

I trained model and ıt worked fine in Collecting your own Detection Datasets with follow this road. But I wanted to improve my model therefore I added photos .I followed same way with repo. But I tried again to train my model but the train returned nan. Why it didnt calculate value ? python3 train_ssd.py --dataset-type=voc --data=data/products --model-dir=models/products5 2020-11-25 08:10:24 - Epoch: 8, Step: 40/44, Avg Loss: nan, Avg Regression Loss nan, Avg Classification Loss: nan Screenshot from 2020-11-25 11-07-06 @dusty-nv

AhmetEnesYalcinkaya avatar Nov 25 '20 08:11 AhmetEnesYalcinkaya

Hi @dusty-nv , i have litteraly the same issue. All my losses during training are nan. I have a multi-label dataset, with each image containing several objects. All images appear to be correctly annotated (pascalVOC).

initial command : python3 train_ssd.py --data=data/mildiou_detection --model-dir=models/mildiou --batch-size=32 --epochs=30 --dataset-type=voc --validation-mean-ap="True"

command for debug : python3 train_ssd.py --data=data/mildiou_detection --model-dir=models/mildiou --batch-size=1 --epochs=30 --dataset-type=voc --validation-mean-ap="True" --debug-steps=1 --workers=0 --log-level="debug"

I've already trained models on other datasets without any problems. Do you have any idea what's causing this problem ?

image

anassee15 avatar Jul 18 '23 21:07 anassee15

@nodk15 try removing/fixing the image annotations from the image/xml directly proceeding the first nan

dusty-nv avatar Jul 18 '23 21:07 dusty-nv

@dusty-nv i try removing with no success, still the same issue... but the images are correctly annotated. Maybe because of the image size (5184x3888) ?

here is the xml file of the first image :

`

  <annotation>JPEGImages</folder>
        <folder>JPEGImages</folder>
        <filename>Pinot noir - Corin 2021-08-16-18.11.16.jpg</filename>
        <path>../JPEGImages/Pinot noir - Corin 2021-08-16-18.11.16.jpg</path>
        <source>
            <database>Unknown</database>
        </source>
        <size>
            <width>5184</width>
            <height>3888</height>
            <depth>3</depth>
        </size>
        <segmented>0</segmented>
        <object>
            <name>downy mildew oil spot square</name>
            <truncated>0</truncated>
            <difficult>0</difficult>
            <bndbox>
                <xmin>1597</xmin>
                <ymin>1429</ymin>
                <xmax>2267</xmax>
                <ymax>2019</ymax>
            </bndbox>
        </object>
        <object>
            <name>downy mildew oil spot square</name>
            <truncated>0</truncated>
            <difficult>0</difficult>
            <bndbox>
                <xmin>1757</xmin>
                <ymin>2029</ymin>
                <xmax>2207</xmax>
                <ymax>2599</ymax>
            </bndbox>
        </object>
        <object>
            <name>downy mildew oil spot square</name>
            <truncated>0</truncated>
            <difficult>0</difficult>
            <bndbox>
                <xmin>2307</xmin>
                <ymin>1649</ymin>
                <xmax>3007</xmax>
                <ymax>2409</ymax>
            </bndbox>
        </object>
        <object>
            <name>downy mildew oil spot square</name>
            <truncated>0</truncated>
            <difficult>0</difficult>
            <bndbox>
                <xmin>2297</xmin>
                <ymin>2389</ymin>
                <xmax>2947</xmax>
                <ymax>2769</ymax>
            </bndbox>
        </object>
        <object>
            <name>downy mildew oil spot square</name>
            <truncated>0</truncated>
            <difficult>0</difficult>
            <bndbox>
                <xmin>3237</xmin>
                <ymin>2189</ymin>
                <xmax>3497</xmax>
                <ymax>2519</ymax>
            </bndbox>
        </object>
        <object>
            <name>downy mildew oil spot square</name>
            <truncated>0</truncated>
            <difficult>0</difficult>
            <bndbox>
                <xmin>2437</xmin>
                <ymin>849</ymin>
                <xmax>3187</xmax>
                <ymax>1579</ymax>
            </bndbox>
        </object>
        <object>
            <name>downy mildew oil spot square</name>
            <truncated>0</truncated>
            <difficult>0</difficult>
            <bndbox>
                <xmin>3387</xmin>
                <ymin>1659</ymin>
                <xmax>3867</xmax>
                <ymax>2349</ymax>
            </bndbox>
        </object>
    </annotation>

`

anassee15 avatar Jul 18 '23 22:07 anassee15

@nodk15 did you remove it's image ID from your dataset's ImageLists ? (i.e. is it no longer being loaded)

If so, it might be another image that is causing the NaN's. In that case, you need to keep filtering/removing them until the NaN's are gone or you find out what about them is causing the NaN's.

dusty-nv avatar Jul 19 '23 20:07 dusty-nv

@dusty-nv Is there a limit to the number of objects per image ? Practically all images generate nan value, and I didn't notice something different between images with a problem and those without.

anassee15 avatar Jul 20 '23 14:07 anassee15

Finally, I solved my problem. The problem was effectively in the annotations. Many of my annotations had an xmax lower than the xmin... Thanks you @dusty-nv for help :)

anassee15 avatar Jul 25 '23 15:07 anassee15