docling Downloading detection and recognition models takes a lot of time and space on my pod

Downloading detection and recognition models takes a lot of time and space on my pod

Open ai-qlik opened this issue 10 months ago • 2 comments

trafficstars

How to prevent Docling from Downloading detection and recognition models

I have deployed a REST endpoint in docker which calls Docling to parse (convert) documents. Every time the code is called, docling starts by download detection and recognition models which is time consuming and heavy on the memory. I would like to turn this feature off and prevent docling from downloading any models!

Please see my very basic code below:

def parse_with_docling(pipeline_input): 
        doc_converter = DocumentConverter()
        input_doc_path = Path(pipeline_input.input_path)
        return doc_converter.convert(input_doc_path).document

I have a hunch that this is caused by the EasyOCR. I would like to set the download_enabled to false for EasyOCR without limiting the OCR feature to EasyOCR.

Thanks in advance! Arash

Screenshot 2025-01-14 at 9 11 38 AM Screenshot 2025-01-14 at 9 12 55 AM

Jan 14 '25 14:01 ai-qlik

You can pre-fetch the models

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.pipeline.standard_pdf_pipeline import StandardPdfPipeline

# # to explicitly prefetch:
# artifacts_path = StandardPdfPipeline.download_models_hf()

artifacts_path = "/local/path/to/artifacts"

pipeline_options = PdfPipelineOptions(artifacts_path=artifacts_path)
doc_converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
    }
)

https://ds4sd.github.io/docling/usage/#provide-specific-artifacts-path

Feb 05 '25 13:02 mswaringen

This PR should simplify all of it: https://github.com/DS4SD/docling/pull/876

Feb 05 '25 13:02 dolfim-ibm

@dolfim-ibm Short of implementation defects/enhancements, this can be closed, no?

Mar 13 '25 11:03 sanmai-NL

docling docling copied to clipboard

Downloading detection and recognition models takes a lot of time and space on my pod

How to prevent Docling from Downloading detection and recognition models

docling
docling copied to clipboard