mmfewshot
mmfewshot copied to clipboard
运行TFA算法出现nan
我按照readme文档配置好了mmfewshot和voc数据集,当我用自带的配置文件运行TFA算法的base-training时,迭代次数超过100后就会nan,请问可能的原因是什么?
2023-05-16 15:59:36,050 - mmfewshot - INFO - Iter [50/18000] lr: 9.810e-03, eta: 1:13:05, time: 0.244, data_time: 0.007, memory: 7041, loss_rpn_cls: 0.2213, loss_rpn_bbox: 0.0396, loss_cls: 0.5162, acc: 92.1133, loss_bbox: 0.0955, loss: 0.8727 2023-05-16 15:59:48,591 - mmfewshot - INFO - Iter [100/18000] lr: 1.980e-02, eta: 1:13:52, time: 0.251, data_time: 0.007, memory: 7041, loss_rpn_cls: 0.1074, loss_rpn_bbox: 0.0503, loss_cls: 0.2786, acc: 96.0000, loss_bbox: 0.1581, loss: 0.5944 2023-05-16 16:00:00,323 - mmfewshot - INFO - Iter [150/18000] lr: 2.000e-02, eta: 1:12:24, time: 0.235, data_time: 0.006, memory: 7041, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 79.5828, loss_bbox: nan, loss: nan 2023-05-16 16:00:10,732 - mmfewshot - INFO - Iter [200/18000] lr: 2.000e-02, eta: 1:09:35, time: 0.208, data_time: 0.006, memory: 7041, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 2.6863, loss_bbox: nan, loss: nan 2023-05-16 16:00:21,427 - mmfewshot - INFO - Iter [250/18000] lr: 2.000e-02, eta: 1:08:09, time: 0.214, data_time: 0.009, memory: 7041, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 1.0000, loss_bbox: nan, loss: nan
We recommend using English or English & Chinese for issues so that we could have broader discussion.
把batch_size调小试试