edgeai-yolov5 icon indicating copy to clipboard operation
edgeai-yolov5 copied to clipboard

When training the dataset, Nan appears

Open zhaijs opened this issue 2 years ago • 6 comments

When training the dataset, Nan appears

When I started trying to train the COCO2017 dataset, first it was normal, but then there was Nan, which I thought might be the cause of the exploding gradient, but I didn't make any changes to the code. This question has been bothering me for a long time and I look forward to your suggestions.

~%BTTIS7CBF REIC_SK$F@2 38A49470C2786F8A255E172DBDCE40CD_750_750

zhaijs avatar Jul 01 '22 05:07 zhaijs

There may be something wrong with OKS loss and I'm trying to use a different loss function

zhaijs avatar Jul 01 '22 07:07 zhaijs

hi @zhaijs Do you solved this problem, would be nice to share some solutions? Thank you

wanglegoc avatar Aug 11 '22 09:08 wanglegoc

I meet the same problem, can you share some solutions?

GeorgeInbelgium avatar Oct 31 '22 08:10 GeorgeInbelgium

I find some of my label files are empty. my solution is to modify the label files.

GeorgeInbelgium avatar Nov 02 '22 07:11 GeorgeInbelgium

I checked the dataset and found no errors. When I calculated kpt_loss_factor, the denominator added 1e-9. kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0))/(torch.sum(kpt_mask != 0)+1e-9)

zhaijs avatar Nov 02 '22 07:11 zhaijs

I met the same issue. Any solution or workaround for it?

liamsun2019 avatar Jan 10 '23 06:01 liamsun2019