mmskeleton icon indicating copy to clipboard operation
mmskeleton copied to clipboard

Accuracy Issue

Open gravesprite opened this issue 4 years ago • 13 comments

Hi, thank you for your code sharing. I was just trying to train out a model using NTU-RGB dataset, and I exactly follow the xview/train.yaml to train that. However, the loss just did not go down at all. Really need some help, could anyone give me some advice?

INFO:mmcv.runner.runner:Epoch(train) [75][295] loss: 4.1075, top1: 0.0166, top5: 0.0872 INFO:mmcv.runner.runner:Epoch [76][100/588] lr: 0.00100, eta: 0:08:05, time: 0.136, data_time: 0.015, memory: 6742, loss: 4.1307 INFO:mmcv.runner.runner:Epoch [76][200/588] lr: 0.00100, eta: 0:07:47, time: 0.125, data_time: 0.003, memory: 6742, loss: 4.1336 INFO:mmcv.runner.runner:Epoch [76][300/588] lr: 0.00100, eta: 0:07:30, time: 0.127, data_time: 0.004, memory: 6742, loss: 4.1310 INFO:mmcv.runner.runner:Epoch [76][400/588] lr: 0.00100, eta: 0:07:13, time: 0.128, data_time: 0.004, memory: 6742, loss: 4.1332 INFO:mmcv.runner.runner:Epoch [76][500/588] lr: 0.00100, eta: 0:06:55, time: 0.128, data_time: 0.004, memory: 6742, loss: 4.1295 INFO:mmcv.runner.runner:Epoch [77][100/588] lr: 0.00100, eta: 0:06:22, time: 0.138, data_time: 0.015, memory: 6742, loss: 4.1288 INFO:mmcv.runner.runner:Epoch [77][200/588] lr: 0.00100, eta: 0:06:05, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1325 INFO:mmcv.runner.runner:Epoch [77][300/588] lr: 0.00100, eta: 0:05:48, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1337 INFO:mmcv.runner.runner:Epoch [77][400/588] lr: 0.00100, eta: 0:05:31, time: 0.126, data_time: 0.004, memory: 6742, loss: 4.1360 INFO:mmcv.runner.runner:Epoch [77][500/588] lr: 0.00100, eta: 0:05:14, time: 0.126, data_time: 0.004, memory: 6742, loss: 4.1318 INFO:mmcv.runner.runner:Epoch [78][100/588] lr: 0.00100, eta: 0:04:41, time: 0.138, data_time: 0.015, memory: 6742, loss: 4.1277 INFO:mmcv.runner.runner:Epoch [78][200/588] lr: 0.00100, eta: 0:04:24, time: 0.125, data_time: 0.003, memory: 6742, loss: 4.1274 INFO:mmcv.runner.runner:Epoch [78][300/588] lr: 0.00100, eta: 0:04:07, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1361 INFO:mmcv.runner.runner:Epoch [78][400/588] lr: 0.00100, eta: 0:03:50, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1322 INFO:mmcv.runner.runner:Epoch [78][500/588] lr: 0.00100, eta: 0:03:33, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1356 INFO:mmcv.runner.runner:Epoch [79][100/588] lr: 0.00100, eta: 0:03:01, time: 0.137, data_time: 0.015, memory: 6742, loss: 4.1330 INFO:mmcv.runner.runner:Epoch [79][200/588] lr: 0.00100, eta: 0:02:44, time: 0.125, data_time: 0.003, memory: 6742, loss: 4.1324 INFO:mmcv.runner.runner:Epoch [79][300/588] lr: 0.00100, eta: 0:02:27, time: 0.125, data_time: 0.003, memory: 6742, loss: 4.1322 INFO:mmcv.runner.runner:Epoch [79][400/588] lr: 0.00100, eta: 0:02:10, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1268 INFO:mmcv.runner.runner:Epoch [79][500/588] lr: 0.00100, eta: 0:01:53, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1339 INFO:mmcv.runner.runner:Epoch [80][100/588] lr: 0.00100, eta: 0:01:21, time: 0.137, data_time: 0.015, memory: 6742, loss: 4.1321 INFO:mmcv.runner.runner:Epoch [80][200/588] lr: 0.00100, eta: 0:01:05, time: 0.125, data_time: 0.003, memory: 6742, loss: 4.1380 INFO:mmcv.runner.runner:Epoch [80][300/588] lr: 0.00100, eta: 0:00:48, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1323 INFO:mmcv.runner.runner:Epoch [80][400/588] lr: 0.00100, eta: 0:00:31, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1301 INFO:mmcv.runner.runner:Epoch [80][500/588] lr: 0.00100, eta: 0:00:14, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1315 INFO:mmcv.runner.runner:Epoch(train) [80][295] loss: 4.1073, top1: 0.0165, top5: 0.0875

gravesprite avatar May 13 '20 00:05 gravesprite

Did you find a solution to your problem ?

yosagaf avatar May 15 '20 22:05 yosagaf

Did you find a solution to your problem ?

No, the issue still exists.

gravesprite avatar May 19 '20 06:05 gravesprite

Did you find a solution to your problem ?

xiaoyang-coder avatar May 30 '20 14:05 xiaoyang-coder

I have exactly the same problem for NTU-RGB-xsub dataset. Does anyone have a solution for it ?

vivek87799 avatar Jun 05 '20 09:06 vivek87799

training_hooks: 
    lr_config: 
      policy: 'step' 
      step: [20, 30, 40, 50] 
    log_config: 
      interval: 100 
      hooks: 
        - type: TextLoggerHook 
    checkpoint_config: 
      interval: 5 
    optimizer_config: 
      grad_clip:

vivek87799 avatar Jun 05 '20 16:06 vivek87799

@vivek87799 thank you

xiaoyang-coder avatar Jun 06 '20 09:06 xiaoyang-coder

I have the same problem on training Kinetics data.Because I just have one gpu, I set gpus:1 and batch_size:128. I don't know how to set lr or other parameters.The loss never converges.

fnxiang avatar Jun 16 '20 16:06 fnxiang

I have the same problem on training Kinetics data. loss is about 6, and can not converge

YeTaoY avatar Jun 17 '20 02:06 YeTaoY

Hi, guys , what @vivek87799 has done fixed my problem.

YeTaoY avatar Jun 17 '20 03:06 YeTaoY

Hi, guys , what @vivek87799 has done fixed my problem.

which file should I modify? train.yaml?

happysheep224 avatar Jul 17 '20 03:07 happysheep224

I didn't try his method,But it should be this document(train.yaml)

------------------ 原始邮件 ------------------ 发件人: "happysheep224"<[email protected]>; 发送时间: 2020年7月17日(星期五) 中午11:27 收件人: "open-mmlab/mmskeleton"<[email protected]>; 抄送: "1771203081"<[email protected]>; "Comment"<[email protected]>; 主题: Re: [open-mmlab/mmskeleton] Accuracy Issue (#311)

Hi, guys , what @vivek87799 has done fixed my problem.

which file should I modify? train.yaml?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

xiaoyang-coder avatar Jul 17 '20 03:07 xiaoyang-coder

@YeTaoY how this method work? I try this way , but it not work . Do you have more tips ?

happysheep224 avatar Jul 17 '20 06:07 happysheep224

Just add grad_clip in train.yaml as vivek87799 said, now loss is decreasing. But I don't understand how this works.

paleomoon avatar Oct 10 '20 02:10 paleomoon