docling
docling copied to clipboard
[FEAT] Make installation of non-essential dependencies optional
Requested feature
In the code, it is already provided that certain packages are not installed as standard, with an error message asking the user to add them. The behavior should be extended by eliminating these unnecessary packages from the required dependencies.
Some packages are cumbersome, cause unnecessary conflicts and bring useless dependencies when the option is not used.
This is the case, for example, with scikit-learn
(and its numpy
versioning problems) and the python ninja distribution, which is not used anywhere by docling
but is brought in by easyocr
.
In #648 , it is stated that it is preferable to integrate an OCR engine by default, and easyocr
was chosen for this reason.
However, @jaluma 's original idea is a good one, so just make easyocr
the default except if the choice of engine is explicit.
-
Base installation (with EasyOCR):
pip install docling
-
Specific OCR models:
pip install docling[easyocr]
(default behaviour)pip install docling[tesseract]
pip install docling[rapidocr]
pip install docling[ocrmac]
Alternatives
Install docling
with no-deps and cherry-pick the deps needed just to use the wanted engine without other engine's incompatibility problem, every time, every update.