samples/sec is around 39, but original is up to 400

Open alvenchen opened this issue 7 years ago • 1 comments

my gpu is Tesla P40, 24GBmemory when i run train_net.py on single gpu, it's about 39 samples/sec, and it seems not much faster on 4-gpu with train_nets_mgpu_new.py(still around 40). and i notice that the cpu Utilization rate is very low(100%) compare to original insightface(1200%) on 48-core cpu.

at first i thought it was due to slow data flow, so i change the data feed code to tensorflow.python.ops.data_flow_ops.FIFOQueue refered from facenet, but it helps nothing, even slower

compare with 200 samples/sec on facenet with inception_resnet_v2 and 400 samples/sec on original insightface with resnet_100 I wonder what is the key point to slow the speed down.

could you give some tips? thanks.

Aug 16 '18 03:08 alvenchen

I'm not quite sure, but I think you can find some tips from this tutorials .

Aug 16 '18 16:08 auroua