nimble
nimble copied to clipboard
The effect of multiple streams is not obvious
Hi, I'm trying to reproduce nimble's experimental results. However, I found that the effect of multi-stream has little effect on the inference latency, but the paper says that it can be up to 1.8×, maybe I have something wrong, I hope you can give me some advice. I successfully installed nimble in docker: GPU: 2080s with 8G global memory Ubuntu 18.04.6 LTS
# inception_v3 [1, 3, 299, 299]
mean (ms) stdev (ms)
pytorch 8.212887 0.211187
mean (ms) stdev (ms)
nimble 2.24783 0.003427
mean (ms) stdev (ms)
nimble-multi 2.31407 0.009554
# inception_v3 [8, 3, 299, 299]
mean (ms) stdev (ms)
pytorch 25.678553 0.287919
mean (ms) stdev (ms)
nimble 17.354554 0.065831
mean (ms) stdev (ms)
nimble-multi 16.428471 0.104019
# densenet201 [1, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 29.020667 0.231637
mean (ms) stdev (ms)
nimble 5.537937 0.004089
mean (ms) stdev (ms)
nimble-multi 5.572467 0.004977
# densenet201 [8, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 31.046828 0.164185
mean (ms) stdev (ms)
nimble 24.178936 0.032238
mean (ms) stdev (ms)
nimble-multi 24.125336 0.060498
# mnasnet0_5 [1, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 4.477023 0.025759
mean (ms) stdev (ms)
nimble 0.565598 0.002112
mean (ms) stdev (ms)
nimble-multi 5.572467 0.004977
# mnasnet0_75 [1, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 4.557251 0.037832
mean (ms) stdev (ms)
nimble 0.68727 0.002274
mean (ms) stdev (ms)
nimble-multi 0.679038 0.002025
# mnasnet1_3 [1, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 4.780402 0.02905
mean (ms) stdev (ms)
nimble 0.950962 0.00627
mean (ms) stdev (ms)
nimble-multi 0.893742 0.06838
# mnasnet1_3 [8, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 6.076544 0.567386
mean (ms) stdev (ms)
nimble 4.953977 0.023374
mean (ms) stdev (ms)
nimble-multi 4.976105 0.025923