mtcnn icon indicating copy to clipboard operation
mtcnn copied to clipboard

train PNet is so slow

Open tzhang2014 opened this issue 7 years ago • 10 comments

when I run python example/train_P_net.py --gpus 0 , My GPU is 1070 INFO:root:Epoch[0] Batch [200] Speed: 123.25 samples/sec Train-Accuracy=0.697969 INFO:root:Epoch[0] Batch [200] Speed: 123.25 samples/sec Train-LogLoss=0.617246 INFO:root:Epoch[0] Batch [200] Speed: 123.25 samples/sec Train-BBOX_MSE=0.103584 can you help me ? this is a wrong ? Where is the mistake?thx

tzhang2014 avatar Jan 31 '18 06:01 tzhang2014

you need put your data in SSD disk

xiaoxiongli avatar Feb 05 '18 08:02 xiaoxiongli

@xiaoxiongli thank you, how much time in your PC, What is the configuration of your PC? thx

tzhang2014 avatar Feb 05 '18 13:02 tzhang2014

@tzhang2014 i also meet this problem, how did you improve it?

INFO:root:Epoch[0] Batch [200] Speed: 126.56 samples/sec Train-Accuracy=0.697195 INFO:root:Epoch[0] Batch [200] Speed: 126.56 samples/sec Train-LogLoss=0.614800 INFO:root:Epoch[0] Batch [200] Speed: 126.56 samples/sec Train-BBOX_MSE=0.106309

linsoncvw avatar Apr 24 '18 06:04 linsoncvw

Only the first round is slow, the other is very fast.

linsoncvw avatar Apr 24 '18 09:04 linsoncvw

You can change mxnet's environment variables to speed training ,just like cmd : export MXNET_GPU_WORKER_NTHREADS=4 (default = 2) and : export MXNET_GPU_COPY_NTHREADS=4 (default = 1) . after i did it , every thing became better

eg : i7-7700 gtx1060 INFO:root:Epoch[0] Batch [3780] Speed: 8343.78 samples/sec Accuracy=0.898810 LogLoss=0.270442 BBOX_MSE=0.015827 INFO:root:Epoch[0] Batch [3800] Speed: 9112.26 samples/sec Accuracy=0.891901 LogLoss=0.282063 BBOX_MSE=0.015802 INFO:root:Epoch[0] Batch [3820] Speed: 10172.07 samples/sec Accuracy=0.883745 LogLoss=0.303172 BBOX_MSE=0.015691 INFO:root:Epoch[0] Batch [3840] Speed: 10388.03 samples/sec Accuracy=0.878459 LogLoss=0.288958 BBOX_MSE=0.015310 INFO:root:Epoch[0] Batch [3860] Speed: 9720.13 samples/sec Accuracy=0.885983 LogLoss=0.310603 BBOX_MSE=0.015680 INFO:root:Epoch[0] Batch [3880] Speed: 9980.33 samples/sec Accuracy=0.879565 LogLoss=0.300225 BBOX_MSE=0.016198

Qidian213 avatar Apr 27 '18 13:04 Qidian213

@linsoncvw After 1 epoch ,the speed is so fast. I don't understand the reason

tzhang2014 avatar Jun 06 '18 06:06 tzhang2014

Did you meet "Cannot find argument 'out_grad'" when using train_P_net.py?

geoffzhang avatar Jun 14 '18 02:06 geoffzhang

@geoffzhang I met the same problem,did you fix it?

EmiPark avatar Jul 03 '18 06:07 EmiPark

@geoffzhang @EmiPark delete all 'out_grad=True' in core\symbol.py

zuoqing1988 avatar Oct 10 '18 06:10 zuoqing1988

@geoffzhang @EmiPark delete all 'out_grad=True' in core\symbol.py delete "out_grad = True",whether it has an impact on training?

cuiyong127 avatar Sep 05 '19 03:09 cuiyong127