docling
docling copied to clipboard
Issue with reading downloading model
@dolfim-ibm I am downloading a model in the Dockerfile using the following command:
RUN python -c "from docling.pipeline.standard_pdf_pipeline import StandardPdfPipeline;
StandardPdfPipeline.download_models_hf(force=True, local_dir='/app/python/rag/resources/artifacts/')"
The model is successfully downloaded to the specified location. Despite this, my component still attempts to download the model again the first time it is used.
Could it be you are downloading the OCR models? in your code do_ocr is a parameter, can you please try once making sure it is False
@dolfim-ibm Like layout and tableformer is there a way to prefetch the OCR models as well?
@dolfim-ibm Like layout and tableformer is there a way to prefetch the OCR models as well?
it depends on the specific OCR engine. as a non-complete summary:
- Tesseract requires the models to be installed as system packages
- EasyOCR has options for disabling the download of the models and set the actual path, see https://ds4sd.github.io/docling/reference/pipeline_options/#docling.datamodel.pipeline_options.EasyOcrOptions
@harinisri2001 I hope your issue is addressed with the pointers from @dolfim-ibm. I will close this until further feedback.