doctr [models] Add model compression utils

[models] Add model compression utils

Open fg-mindee opened this issue 4 years ago • 5 comments

Add a doctr.models.utils module to compress existing models and improve their latency / memory load for inference purposes on CPU. Some interesting leads to investigate:

[x] FP conversion (#10)
[x] Quantization (#10)
[ ] Pruning (cf. https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide)
[x] TF Lite export (#10)
[x] ONNX export (cf. https://github.com/onnx/keras-onnx & https://github.com/onnx/tensorflow-onnx)
[x] Export to SaveModel (#246)

Optional: TensorRT export (cf. https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/)

Jan 11 '21 10:01 fg-mindee

ONNX conversion seems to be incompatible with TF 2.4.* as per https://github.com/onnx/keras-onnx/issues/662. I tried on my end and encountered the same problem. Moving this to the next release until this gets fixed!

Feb 02 '21 12:02 fg-mindee

A good lead for ONNX support would be to use https://github.com/onnx/tensorflow-onnx (might have to create a savemodel to use it but it's worth a look)

Mar 17 '21 11:03 fg-mindee

@frgfm i think we can remove the tensorrt point If we support onnx wdyt ?

Sep 04 '22 19:09 felixdittrich92

Yes sure! We'll need to take a look at pruning at some point

Sep 05 '22 10:09 frgfm

yeah pruning is fine but tensorrt is a bit to much (should do the user on his own side if we can provide onnx this should be not so tricky)

Sep 05 '22 10:09 felixdittrich92

doctr doctr copied to clipboard

[models] Add model compression utils

doctr
doctr copied to clipboard