nsganetv2 icon indicating copy to clipboard operation
nsganetv2 copied to clipboard

get NAN loss

Open renqianluo opened this issue 3 years ago • 1 comments

Hi, I use the train_imagenet.py to train the searched architecture following the guidance in README. I train the code on 4 GPUs. All the hyperparameters follow the guidance in README with --num-gpu=4 But I got 'Loss NAN' at the very beginning of the training.

renqianluo avatar Mar 15 '21 05:03 renqianluo

same here. But I used the evaluator script to test a subnet of MobileNet V3

cifar10 Train Epoch #1: 100%|█| 391/391 [01:01<00:00, 6.38it/s, loss=nan, top1= Validate Epoch #1 : 100%|█| 50/50 [00:07<00:00, 6.72it/s, loss=nan, top1=10, to

Edit: looks like the loss is far too high at the beginning and learn rate is set to 0.01: cifar10 Train Epoch #1: 0%| | 1/391 [04:22<28:26:08, 262.48s/it, loss=2.71e+8

vinh-cao avatar Sep 21 '22 08:09 vinh-cao