pytorch-ssd icon indicating copy to clipboard operation
pytorch-ssd copied to clipboard

Got nan in loss function

Open ijalalfrz opened this issue 4 years ago • 6 comments

Hi , im using this as my object detection but get nan value for classification and regression loss, im using pytorch 1.3.1

ijalalfrz avatar Jun 22 '20 07:06 ijalalfrz

Hi, I had the same problem. This was because of the following reason:

The zeroth class is the background and I was having a dataset for detecting only a single object. I had forgotten to change the class label to 1 for the class I was detecting. Perhaps, you are having the same issue.

sudarshan-kamath avatar Jul 11 '20 09:07 sudarshan-kamath

Hi, I had the same problem. This was because of the following reason:

The zeroth class is the background and I was having a dataset for detecting only a single object. I had forgotten to change the class label to 1 for the class I was detecting. Perhaps, you are having the same issue.

I am having the same issue. I am training in one class only and all my loss results are nan. I printed labels and it seems my unique class label is 1 already. Where did u exactly find in ur code the class label wrong?

When I print the class_dict in _read_data on open_images.py class (actually I modified this to fit in my dataset) I have the following output:

{'BACKGROUND': 0, 'Face': 1}

thimabru1010 avatar Aug 04 '20 15:08 thimabru1010

@thimabru1010 could you resolve the issue?

zinercodes avatar Mar 03 '21 17:03 zinercodes

I remember I managed to found a solution, but this was a long time ago. Seeing my code maybe the problem was on my box labels. I was forgetting to divide bboxes by width and height. But quite not sure. If you want I can share my code with you.

thimabru1010 avatar Mar 03 '21 20:03 thimabru1010

@thimabru1010 thanks for quick reply. Yes, if you share your code, it will be really helpful.

zinercodes avatar Mar 04 '21 03:03 zinercodes

Hey @zinercodes,

Later than never!

I put the code in my repository: https://github.com/thimabru1010/ssd-thermal-face-detection

I was trying to train the SSD with mobilenet on a custom dataset. To do this I created an organize.py to generate put data (boxes and class) on the right structure. We only need the image_id, bboxes coordinates, and class name. I think the problem I was facing was due to the normalization of bounding boxes. I wasn't dividing by width and height. Not sure if this is your problem, but hope it helps.

Sorry for the late reply.

Here is a picture of the old commit and the change I made. Hope this guides you.

image

thimabru1010 avatar Apr 05 '21 14:04 thimabru1010