samples/sec is around 39, but original is up to 400
my gpu is Tesla P40, 24GBmemory when i run train_net.py on single gpu, it's about 39 samples/sec, and it seems not much faster on 4-gpu with train_nets_mgpu_new.py(still around 40). and i notice that the cpu Utilization rate is very low(100%) compare to original insightface(1200%) on 48-core cpu.
at first i thought it was due to slow data flow, so i change the data feed code to tensorflow.python.ops.data_flow_ops.FIFOQueue refered from facenet, but it helps nothing, even slower
compare with 200 samples/sec on facenet with inception_resnet_v2 and 400 samples/sec on original insightface with resnet_100 I wonder what is the key point to slow the speed down.
could you give some tips? thanks.
I'm not quite sure, but I think you can find some tips from this tutorials .