docling icon indicating copy to clipboard operation
docling copied to clipboard

Downloading detection and recognition models takes a lot of time and space on my pod

Open ai-qlik opened this issue 10 months ago • 2 comments
trafficstars

How to prevent Docling from Downloading detection and recognition models

I have deployed a REST endpoint in docker which calls Docling to parse (convert) documents. Every time the code is called, docling starts by download detection and recognition models which is time consuming and heavy on the memory. I would like to turn this feature off and prevent docling from downloading any models!

Please see my very basic code below:

def parse_with_docling(pipeline_input): 
        doc_converter = DocumentConverter()
        input_doc_path = Path(pipeline_input.input_path)
        return doc_converter.convert(input_doc_path).document

I have a hunch that this is caused by the EasyOCR. I would like to set the download_enabled to false for EasyOCR without limiting the OCR feature to EasyOCR.

Thanks in advance! Arash

Screenshot 2025-01-14 at 9 11 38 AM Screenshot 2025-01-14 at 9 12 55 AM

ai-qlik avatar Jan 14 '25 14:01 ai-qlik

You can pre-fetch the models

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.pipeline.standard_pdf_pipeline import StandardPdfPipeline

# # to explicitly prefetch:
# artifacts_path = StandardPdfPipeline.download_models_hf()

artifacts_path = "/local/path/to/artifacts"

pipeline_options = PdfPipelineOptions(artifacts_path=artifacts_path)
doc_converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
    }
)

https://ds4sd.github.io/docling/usage/#provide-specific-artifacts-path

mswaringen avatar Feb 05 '25 13:02 mswaringen

This PR should simplify all of it: https://github.com/DS4SD/docling/pull/876

dolfim-ibm avatar Feb 05 '25 13:02 dolfim-ibm

@dolfim-ibm Short of implementation defects/enhancements, this can be closed, no?

sanmai-NL avatar Mar 13 '25 11:03 sanmai-NL