keras-onnx icon indicating copy to clipboard operation
keras-onnx copied to clipboard

Fine tuned Keras VGGNet16 shows no performance advantages.

Open rozeappletree opened this issue 5 years ago • 5 comments

image

This is the comparision of raw VGG16 keras model inference time and the same model on onnx runtime. Why don't I see any performance advantages?

There is extremely small improvement

Replicate results by running this notebook on colab CPU

rozeappletree avatar Oct 20 '20 10:10 rozeappletree

Whenever measuring the performance of AI models please note this :

  1. More CPU cores will not make the model faster, unless the framework supports concurrent execution of layers. On CPU only machine, you can improve the inference speed by loading in more models across multiple cores. ( A model can at max take 100% of single core, even if you have 15 remaining cores, it'll not be used).
  2. GPUs on the other hand can make model inference faster because they have capabilities to parallelize layer operations and matrix multiplications on CUDA cores. If you have more CUDA cores, more faster the inference will be. This is irrespective of any DL framework as they all use cuDNN bindings. GPUs can execute batches of inputs at once because of the nature of GPU hardware design.

So thumb rule => CPU : Concurrency :: GPU : Batching

All these optimizations will obviously will not make a model faster on CPU because the utilisation will never exploit multiple cores.

Narasimha1997 avatar Oct 20 '20 14:10 Narasimha1997

Hey @Narasimha1997, I do not understand why onnx does not make models faster. Huggingface uses onnx to run large pretained networks on CPU. So, can't I replicate the same using keras-onnx? Or do I have to use onnx models converted from pytorch models?

rozeappletree avatar Oct 21 '20 04:10 rozeappletree

When you use onnxruntime to evaluate performance (say run 100 times), please skip the first few runs (for example, 10 times) of evaluations. Especially for the first run, onnxruntime need do some extra work, so it costs much more time than usual.

jiafatom avatar Oct 22 '20 15:10 jiafatom

Hey @jiafatom, The results were smashing for lenet-type architecture (upto 177 times fast) using your method. But VGGNet shows NO improvement. Updated the notebook.

rozeappletree avatar Oct 23 '20 08:10 rozeappletree

For this perf issue, I feel that the converter already does its job well, and this is onnxruntime issue. You may need reach onnxruntime repo and post the question there.

jiafatom avatar Oct 23 '20 14:10 jiafatom