PocketFlow icon indicating copy to clipboard operation
PocketFlow copied to clipboard

Wrong baseline models to measure speedup against

Open dhingratul opened this issue 6 years ago • 9 comments

Describe the bug A clear and concise description of what the bug is. I ran two distinct experiments, one on uniform quantization, and one on channel pruning with the same resnet model, however, the outputs from both this optimization produced different model_original against which speedup is measured. The one from uniform quantization runs @ 25ms, and one from channel pruning @ 20ms. How are you measuring baseline ? To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

dhingratul avatar Dec 04 '18 23:12 dhingratul

@dhingratul We need some details to reproduce your issue.

  • Which dataset are you using, CIFAR-10 or ImageNet?
  • What is the depth of your ResNet model?
  • Are you using *.pb models to measure the time consumption?
  • Which device are you using for inference computation, CPU or GPU?
  • What is the mini-batch size?

jiaxiang-wu avatar Dec 05 '18 06:12 jiaxiang-wu

  1. ResNet
  2. resnet_at_ilsvrc12_run.py
  3. .pb
  4. GPU
  5. Whatever is defined here , run_local.sh nets/resnet_at_ilsvrc12_run.py

dhingratul avatar Dec 05 '18 17:12 dhingratul

Got it. We are reproducing your issue.

jiaxiang-wu avatar Dec 05 '18 23:12 jiaxiang-wu

@dhingratul Sorry, we cannot reproduce your issue. According to our results (see the benchmark code for inference speed in PR #136), the model_original.pb generated by export_chn_pruned_tflite_model.py costs 3.23ms and the one generated by export_quant_tflite_model.py costs 3.34ms, which basically are the same.

Some notes:

  • We set batch size to 1. Is this the same on your side?
  • The inference time (~3.3ms) is much shorter than 20-25ms as mentioned in your comment.
  • Can you test your *.pb models with our benchmark code in #136?

jiaxiang-wu avatar Dec 10 '18 01:12 jiaxiang-wu

  1. Yes, The only difference I see is that you are generating data with all zeros, and I generate it with np.random.rand. I use batch size and average over 1000 runs, leaving out the 1st inference(you use 100) because that time is always inflated due to GPU warmup. The difference in inference times could be because of the different GPU architecture. I am more interested in the percent speedup rather than actual numbers.
  2. The 20ms inferences were on an older gen GPU For 1080Ti, I see the model_dcp_eval/model_original.pb run at ~5ms, and model_uqtf_eval/model_original.pb run at ~8ms.
  3. The bug is reproducible with your benchmark code as well

dhingratul avatar Dec 10 '18 18:12 dhingratul

@dhingratul After changing np.zeros to np.random.rand, I still cannot reproduce your issue. Following is my results: model_original.pb (chn-pruned): 3.48ms 3.41ms 3.27ms 3.26ms model_original.pb (quant) 3.47ms 3.41ms 3.38ms 3.24ms P.S.: I am using a P40 GPU.

Can you post your *.pb model files, so I can test on them?

jiaxiang-wu avatar Dec 11 '18 01:12 jiaxiang-wu

@jiaxiang-wu DCP models https://drive.google.com/open?id=1NijcwZ-Cwd-Nqa73E2D5nTL_X2yhB32a UQTF Model https://drive.google.com/open?id=1LIYaJZclwBllEThoWZScj23Sq4_LkUxx

dhingratul avatar Dec 19 '18 23:12 dhingratul

Thanks a lot. We are looking into this issue.

jiaxiang-wu avatar Dec 20 '18 01:12 jiaxiang-wu

Got same results as @dhingratul , details:

  • use the same export_quant_tflite_model.py to generate *.pb file, the --model_file is from models_eval(full_prec_model) and models_uqtf_eval(quant_model).

For GPU.

  • model_original.pb (full_prec_model): 5.15ms,4.84ms,4.68ms,5.07ms
  • model_original.pb (quant): 6.53ms,6.24ms,6.60ms,6.77ms.

For CPU.

  • model_original.tflite (full_prec_model): 66.77ms,68.41ms.
  • model_original.tflite (quant): 88.07ms,85.29ms.
  • model_quantized.tflite(quant): 187.28ms,190.10ms.
  • Backbone: Resnet18 GPU:1080Ti CPU:E5-2650 v4 OS:Ubuntu14.04

Thanks a lot.

Tingelam avatar Dec 25 '18 09:12 Tingelam