Make RapidOcrOptions ignore `artifacts_path`
Requested feature
Currently, when using RapidOCR, the initialization of RapidOcrModel considers artifacts_path. That is, it searches for model artifacts under artifacts_path as defined here:
- https://github.com/docling-project/docling/blob/6a04e273528691eb22a5708f1270d4c5fa8f5b7c/docling/models/auto_ocr_model.py#L65-L74
- https://github.com/docling-project/docling/blob/6a04e273528691eb22a5708f1270d4c5fa8f5b7c/docling/models/rapid_ocr_model.py#L125-L149
When installing rapidocr it already ships with the base onnx models available under ...\Lib\site-packages\rapidocr\models. Therefore, we usually do not need downloading these models separately from Modelscope. However, when setting artifacts_path in the pipeline, e.g., for loading the layout detection or table structure model, we are currently not able to load the default onnx models shipped with rapidocr.
@geoHeil Would it be possible to either deliberately skip a globally defined artifacts_path in order to preload the shipped models and skip downloading from Modelscope or b) make loading from ...\Lib\site-packages\rapidocr\models the default?
Sounds like it should be. Would you want to raise a PR?
@geoHeil I was hoping you could fill in. 😁 Until I have set everything up to prepare a proper PR... I am not sure I currently find the time unfortunately.
Hm I am a bit tight in the next weeks just as well.