doctr icon indicating copy to clipboard operation
doctr copied to clipboard

[models] Add model compression utils

Open fg-mindee opened this issue 4 years ago • 5 comments

Add a doctr.models.utils module to compress existing models and improve their latency / memory load for inference purposes on CPU. Some interesting leads to investigate:

  • [x] FP conversion (#10)
  • [x] Quantization (#10)
  • [ ] Pruning (cf. https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide)
  • [x] TF Lite export (#10)
  • [x] ONNX export (cf. https://github.com/onnx/keras-onnx & https://github.com/onnx/tensorflow-onnx)
  • [x] Export to SaveModel (#246)

Optional: TensorRT export (cf. https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/)

fg-mindee avatar Jan 11 '21 10:01 fg-mindee

ONNX conversion seems to be incompatible with TF 2.4.* as per https://github.com/onnx/keras-onnx/issues/662. I tried on my end and encountered the same problem. Moving this to the next release until this gets fixed!

fg-mindee avatar Feb 02 '21 12:02 fg-mindee

A good lead for ONNX support would be to use https://github.com/onnx/tensorflow-onnx (might have to create a savemodel to use it but it's worth a look)

fg-mindee avatar Mar 17 '21 11:03 fg-mindee

@frgfm i think we can remove the tensorrt point If we support onnx wdyt ?

felixdittrich92 avatar Sep 04 '22 19:09 felixdittrich92

Yes sure! We'll need to take a look at pruning at some point

frgfm avatar Sep 05 '22 10:09 frgfm

yeah pruning is fine but tensorrt is a bit to much (should do the user on his own side if we can provide onnx this should be not so tricky)

felixdittrich92 avatar Sep 05 '22 10:09 felixdittrich92