DRENet
DRENet copied to clipboard
performance decrease during training
Hi, do you have suggestion to overcome this problem during training ?
Epoch gpu_mem box obj cls dgi total targets img_size
0/199 11G 0.1279 0.01601 0 0.008378 2.849 6 512: 100%|█| 1800/1800 [14:48<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [01:07<00
30.39782691001892
all 2.75e+03 4.51e+03 0 0 5.13e-06 9.81e-07
Epoch gpu_mem box obj cls dgi total targets img_size
1/199 11G 0.1261 0.01524 0 0.005636 2.846 6 512: 100%|█| 1800/1800 [14:03<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [01:01<00
31.465840816497803
all 2.75e+03 4.51e+03 0 0 3.55e-06 6.64e-07
Epoch gpu_mem box obj cls dgi total targets img_size
2/199 11G 0.1214 0.01546 0 0.005382 2.844 14 512: 100%|█| 1800/1800 [13:46<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [01:03<00
32.228920221328735
all 2.75e+03 4.51e+03 0.321 0.297 0.194 0.0497
Epoch gpu_mem box obj cls dgi total targets img_size
3/199 11G 0.1142 0.01436 0 0.005227 2.839 20 512: 100%|█| 1800/1800 [13:39<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:58<00
28.6451997756958
all 2.75e+03 4.51e+03 0.316 0.485 0.345 0.0999
Epoch gpu_mem box obj cls dgi total targets img_size
4/199 11G 0.09978 0.01415 0 0.005147 2.832 7 512: 100%|█| 1800/1800 [13:23<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:57<00
28.444270849227905
all 2.75e+03 4.51e+03 0.408 0.578 0.472 0.167
Epoch gpu_mem box obj cls dgi total targets img_size
5/199 11G 0.09265 0.01457 0 0.005125 2.829 5 512: 100%|█| 1800/1800 [13:32<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [01:02<00
30.84639859199524
all 2.75e+03 4.51e+03 0.399 0.623 0.507 0.161
Epoch gpu_mem box obj cls dgi total targets img_size
6/199 11G 0.08306 0.01727 0 0.005281 2.825 10 512: 100%|█| 1800/1800 [13:44<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [01:01<00
30.013824462890625
all 2.75e+03 4.51e+03 0.285 0.589 0.453 0.145
Epoch gpu_mem box obj cls dgi total targets img_size
7/199 11G nan nan 0 0.005711 nan 6 512: 100%|█| 1800/1800 [13:36<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:51<00
31.282738208770752
all 2.75e+03 4.51e+03 0 0 1.57e-06 1.74e-07
Epoch gpu_mem box obj cls dgi total targets img_size
8/199 11G nan nan 0 nan nan 10 512: 100%|█| 1800/1800 [13:31<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:49<00
32.83151125907898
all 2.75e+03 0 0 0 0 0
Epoch gpu_mem box obj cls dgi total targets img_size
9/199 11G nan nan 0 nan nan 9 512: 100%|█| 1800/1800 [13:20<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:45<00
29.580291509628296
all 2.75e+03 0 0 0 0 0
Epoch gpu_mem box obj cls dgi total targets img_size
10/199 11G nan nan 0 nan nan 4 512: 100%|█| 1800/1800 [13:25<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:48<00
32.03327965736389
all 2.75e+03 0 0 0 0 0
Epoch gpu_mem box obj cls dgi total targets img_size
11/199 11G nan nan 0 nan nan 9 512: 100%|█| 1800/1800 [13:28<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:47<00
30.341226816177368
all 2.75e+03 0 0 0 0 0
Epoch gpu_mem box obj cls dgi total targets img_size
12/199 11G nan nan 0 nan nan 2 512: 100%|█| 1800/1800 [13:11<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:45<00
29.359901189804077
all 2.75e+03 0 0 0 0 0
Epoch gpu_mem box obj cls dgi total targets img_size
13/199 11G nan nan 0 nan nan 13 512: 100%|█| 1800/1800 [13:05<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:45<00
29.436581134796143
all 2.75e+03 0 0 0 0 0
Epoch gpu_mem box obj cls dgi total targets img_size
14/199 11G nan nan 0 nan nan 7 512: 100%|█| 1800/1800 [13:04<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:45<00
29.631073713302612
all 2.75e+03 0 0 0 0 0
Epoch gpu_mem box obj cls dgi total targets img_size
15/199 11G nan nan 0 nan nan 6 512: 100%|█| 1800/1800 [13:08<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:45<00
29.1485652923584
all 2.75e+03 0 0 0 0 0
Epoch gpu_mem box obj cls dgi total targets img_size
16/199 11G nan nan 0 nan nan 18 512: 100%|█| 1800/1800 [13:14<00
Class Images Targets P R [email protected] [email protected]:.95: 100%|█| 344/344 [00:46<00
29.673731088638306
all 2.75e+03 0 0 0 0 0
There seems a gradient explosion (or something else) that lead to a NAN loss value. What about turning down the learning rate, or clip the gradient before optimizer.step() ?