docling icon indicating copy to clipboard operation
docling copied to clipboard

[FEAT] Make installation of non-essential dependencies optional

Open Lrakotoson opened this issue 1 month ago • 1 comments

Requested feature

In the code, it is already provided that certain packages are not installed as standard, with an error message asking the user to add them. The behavior should be extended by eliminating these unnecessary packages from the required dependencies.

Some packages are cumbersome, cause unnecessary conflicts and bring useless dependencies when the option is not used.
This is the case, for example, with scikit-learn (and its numpy versioning problems) and the python ninja distribution, which is not used anywhere by docling but is brought in by easyocr.

In #648 , it is stated that it is preferable to integrate an OCR engine by default, and easyocr was chosen for this reason.
However, @jaluma 's original idea is a good one, so just make easyocr the default except if the choice of engine is explicit.

  1. Base installation (with EasyOCR): pip install docling

  2. Specific OCR models: pip install docling[easyocr] (default behaviour) pip install docling[tesseract] pip install docling[rapidocr] pip install docling[ocrmac]

Alternatives

Install docling with no-deps and cherry-pick the deps needed just to use the wanted engine without other engine's incompatibility problem, every time, every update.

Lrakotoson avatar Jan 22 '25 18:01 Lrakotoson