Michele Dolfi comments

Results 172 comments of


                                            Michele Dolfi

Improve backend resolution logic

As discovered in #542, some MS Office XML archives have the meta file `[Content_Types].xml` at the end, which is not captured by the 8K bytes signature. One way of improving...

Improve backend resolution logic

There is no doubt the logic has to be fixed and improved, maybe also simplified altogether. The initial use case which was pretty relevant for us is iterating through a...

Pass HTTP request headers to docling when parsing via url

Actually, we have already the first steps for this feature. The code downloading the files allows for custom headers, see https://github.com/DS4SD/docling-core/blob/main/docling_core/utils/file.py#L52. We only need to propagate the arguments all the...

fix: enable locks for threadsafe pdfium

CI cancelled. It seems to have a deadlock.

fix: enable locks for threadsafe pdfium

Tested with concurrent processing on mac and linux container.

Using Docling with costume layout and table recognition models

The choice of the models is done at the Pipeline level. For example, the PDF pipeline (called `StandardPdfPipeline`) is defined in [docling/pipeline/standard_pdf_pipeline.py](../blob/main/docling/pipeline/standard_pdf_pipeline.py). You can make your own pipeline with different...

Michele Dolfi

Improve backend resolution logic

Improve backend resolution logic

Pass HTTP request headers to docling when parsing via url

fix: enable locks for threadsafe pdfium

fix: enable locks for threadsafe pdfium

Using Docling with costume layout and table recognition models

feat: support xlsm files

Specific language for easyOCR

PDF convert error

PDF convert error