ASL icon indicating copy to clipboard operation
ASL copied to clipboard

In my task,it alwasy return nan,May be it is not as wide use as BCE?

Open MangoFF opened this issue 2 years ago • 3 comments

Why It always return nan?here is my log (l1_loss): L1Loss() (new_loss): AsymmetricLossOptimized() (bcewithlog_loss): AsymmetricLossOptimized() (iou_loss): IOUloss() ) ) 2022-04-20 13:03:33 | INFO | yolox.core.trainer:202 - ---> start train epoch1 2022-04-20 13:03:39 | INFO | yolox.core.trainer:260 - epoch: 1/45, iter: 10/646, mem: 5053Mb, iter_time: 0.555s, data_time: 0.001s, total_loss: nan, iou_loss: 2.4, l1_loss: 0.0, conf_loss: nan, cls_loss: 0.0, lr: 1.498e-10, size: 480, ETA: 4:28:59 2022-04-20 13:03:45 | INFO | yolox.core.trainer:260 - epoch: 1/45, iter: 20/646, mem: 5571Mb, iter_time: 0.572s, data_time: 0.001s, total_loss: nan, iou_loss: 3.1, l1_loss: 0.0, conf_loss: nan, cls_loss: 0.0, lr: 5.991e-10, size: 640, ETA: 4:33:02 2022-04-20 13:03:48 | INFO | yolox.core.trainer:260 - epoch: 1/45, iter: 30/646, mem: 5571Mb, iter_time: 0.324s, data_time: 0.001s, total_loss: nan, iou_loss: 3.1, l1_loss: 0.0, conf_loss: nan, cls_loss: 0.0, lr: 1.348e-09, size: 384, ETA: 3:54:11 2022-04-20 13:03:52 | INFO | yolox.core.trainer:260 - epoch: 1/45, iter: 40/646, mem: 5571Mb, iter_time: 0.380s, data_time: 0.000s, total_loss: nan, iou_loss: 2.3, l1_loss: 0.0, conf_loss: nan, cls_loss: 0.0, lr: 2.396e-09, size: 448, ETA: 3:41:30 2022-04-20 13:03:56 | INFO | yolox.core.trainer:260 - epoch: 1/45, iter: 50/646, mem: 5571Mb, iter_time: 0.442s, data_time: 0.000s, total_loss: nan, iou_loss: 2.3, l1_loss: 0.0, conf_loss: nan, cls_loss: 0.0, lr: 3.744e-09, size: 512, ETA: 3:39:53 2022-04-20 13:03:59 | INFO | yolox.core.trainer:260 - epoch: 1/45, iter: 60/646, mem: 5571Mb, iter_time: 0.283s, data_time: 0.001s, total_loss: nan, iou_loss: 2.7, l1_loss: 0.0, conf_loss: nan, cls_loss: 0.0, lr: 5.392e-09, size: 320, ETA: 3:25:58 2022-04-20 13:04:02 | INFO | yolox.core.trainer:260 - epoch: 1/45, iter: 70/646, mem: 5571Mb, iter_time: 0.275s, data_time: 0.001s, total_loss: nan, iou_loss: 2.7, l1_loss: 0.0, conf_loss: nan, cls_loss: 0.0, lr: 7.339e-09, size: 448, ETA: 3:15:28 2022-04-20 13:04:05 | INFO | yolox.core.trainer:260 - epoch: 1/45, iter: 80/646, mem: 5571Mb, iter_time: 0.293s, data_time: 0.001s, total_loss: nan, iou_loss: 2.4, l1_loss: 0.0, conf_loss: nan, cls_loss: 0.0, lr: 9.585e-09, size: 512, ETA: 3:08:40 2022-04-20 13:04:07 | INFO | yolox.core.trainer:260 - epoch: 1/45, iter: 90/646, mem: 5571Mb, iter_time: 0.228s, data_time: 0.001s, total_loss: nan, iou_loss: 2.5, l1_loss: 0.0, conf_loss: nan, cls_loss: 0.0, lr: 1.213e-08, size: 384, ETA: 2:59:52

MangoFF avatar Apr 20 '22 05:04 MangoFF

@MangoFF, I am also having trouble to apply the loss on other task. Values of the loss are very high around 3k-5k range. (Also loss doesn't decreases much from that range). Not sure if loss is sensitive to default loss hyper parameter values or there's anything else wrong.

dineshdaultani avatar Jul 08 '22 03:07 dineshdaultani

I also get NaNs when using ASL on a custom multi-label classification task. Everything seemed to work fine when I tested with gamma_neg=0, gamma_pos=0 and gamma_neg=2, gamma_pos=2. However, it seems that I get NaNs as soon as I choose gamma_neg to be larger than gamma_pos. Maybe an issue with numeric stability?

IsabelFunke avatar Oct 28 '22 16:10 IsabelFunke

I also get NaNs when using ASL on a custom multi-label classification task. Everything seemed to work fine when I tested with gamma_neg=0, gamma_pos=0 and gamma_neg=2, gamma_pos=2. However, it seems that I get NaNs as soon as I choose gamma_neg to be larger than gamma_pos. Maybe an issue with numeric stability?

The Adam or AdamW is recommended as an optimizer, but the SGD is not recommended.

sorrowyn avatar Nov 24 '22 02:11 sorrowyn