keras-onnx
keras-onnx copied to clipboard
Fine tuned Keras VGGNet16 shows no performance advantages.

This is the comparision of raw VGG16 keras model inference time and the same model on onnx runtime. Why don't I see any performance advantages?
There is extremely small improvement
Replicate results by running this notebook on colab CPU
Whenever measuring the performance of AI models please note this :
- More CPU cores will not make the model faster, unless the framework supports concurrent execution of layers. On CPU only machine, you can improve the inference speed by loading in more models across multiple cores. ( A model can at max take 100% of single core, even if you have 15 remaining cores, it'll not be used).
- GPUs on the other hand can make model inference faster because they have capabilities to parallelize layer operations and matrix multiplications on CUDA cores. If you have more CUDA cores, more faster the inference will be. This is irrespective of any DL framework as they all use
cuDNNbindings. GPUs can execute batches of inputs at once because of the nature of GPU hardware design.
So thumb rule => CPU : Concurrency :: GPU : Batching
All these optimizations will obviously will not make a model faster on CPU because the utilisation will never exploit multiple cores.
Hey @Narasimha1997, I do not understand why onnx does not make models faster. Huggingface uses onnx to run large pretained networks on CPU. So, can't I replicate the same using keras-onnx? Or do I have to use onnx models converted from pytorch models?
When you use onnxruntime to evaluate performance (say run 100 times), please skip the first few runs (for example, 10 times) of evaluations. Especially for the first run, onnxruntime need do some extra work, so it costs much more time than usual.
Hey @jiafatom, The results were smashing for lenet-type architecture (upto 177 times fast) using your method. But VGGNet shows NO improvement. Updated the notebook.
For this perf issue, I feel that the converter already does its job well, and this is onnxruntime issue. You may need reach onnxruntime repo and post the question there.