docling Improve Deployment Efficiency: Integrate ONNX Runtime as Inference Engine Option

Requested feature

First of all, congrats on the amazing work !

I have two improvement ideas that might help simplify using this library in a wider range production workloads:

Support for ONNX Runtime Inference Engine This feature request proposes adding support for ONNX Runtime as an inference engine option alongside the existing PyTorch backend. This addresses the need for reduced deployment size and simplified dependencies. The large size of the PyTorch runtime library presents a significant challenge for deploying AI models in resource-constrained environments (e.g., edge devices, embedded systems). ONNX Runtime provides a lightweight and efficient alternative, enabling smaller deployment packages and faster startup times.
Support for Smaller, Faster AI Models This feature request proposes allowing users to select smaller, potentially less accurate AI models for faster inference, particularly on CPU-constrained hardware. While larger, more accurate models might be preferred in some scenarios, many applications prioritize speed and low latency over absolute precision. Supporting a wider range of model sizes gives users more control over the trade-off between accuracy and performance.

Alternatives

...

Nov 25 '24 09:11 CVxTz

@CVxTz Yes, we have this on our roadmap. There is some work-in-progress outlined here from @pavel-denisov-fraunhofer on the layout-model to run it in ONNX, we will pick this up after we completed GPU support because it will change the model codebase significantly.

Nov 25 '24 13:11 cau-git

Great ! Thank you

Nov 25 '24 19:11 CVxTz

This feature is extremely important. I have observed that it runs very slowly when executed on arm64.

May 20 '25 09:05 whisper-bye