docling Using Docling with costume layout and table recognition models

Is it possible to use the docling with costume models for layout and table recognition.

I would like to use the pipeline by replacing the existing models with my own models for layout and table recognition. I am wondering if the documentation has any example about using costume AI models.

Nov 05 '24 19:11 ALIYoussef

The choice of the models is done at the Pipeline level. For example, the PDF pipeline (called StandardPdfPipeline) is defined in docling/pipeline/standard_pdf_pipeline.py.

You can make your own pipeline with different models, or simply extend with others. We have an example which extends the PDF pipeline with an image understanding model. See https://ds4sd.github.io/docling/examples/develop_picture_enrichment/.

Nov 06 '24 07:11 dolfim-ibm

@ALIYoussef You can of course also provide extension to docling via a PR.

Nov 06 '24 09:11 PeterStaar-IBM

@ALIYoussef We would be excited to see alternative layout or table structure models implementations from the community. The example above posted by @dolfim-ibm is a good way to understand the basic principle of how to add a model. If you want to replicate a new layout model or table model, the other good starting point is the actual implementations of the default models. The code is very readable, see here and here.

Nov 06 '24 09:11 cau-git

I appreciate your support. I will give it a try and keep you posted!

Nov 06 '24 15:11 ALIYoussef

Super, looking forward what you cook up. A few pointers:

Look into the table 1 of DocLayNet

Look into the dp-bench

Nov 06 '24 15:11 PeterStaar-IBM

@PeterStaar-IBM , have you benchmarked docling_v2 on the dp-bench? How good it is on tables?

Nov 18 '24 05:11 mllife

the team is doing the evals in the next weeks

Nov 18 '24 06:11 PeterStaar-IBM

@PeterStaar-IBM , I found this - https://huggingface.co/ds4sd/docling-models#tableformer

is this the same?

Model (TEDS)	Simple table	Complex table	All tables
Tabula	78.0	57.8	67.9
Traprange	60.8	49.9	55.4
Camelot	80.0	66.0	73.0
Acrobat Pro	68.9	61.8	65.3
EDD	91.2	85.4	88.3
TableFormer	95.4	90.1	*93.6*

Nov 18 '24 07:11 mllife

yes, same model but we have some updated weights, so I expect the numbers with the current weights to be better actually.

Nov 18 '24 07:11 PeterStaar-IBM

closing for now!

Nov 18 '24 08:11 PeterStaar-IBM

Hello, @PeterStaar-IBM , I see the the OTSL paper have some numbers (95.5) but they are on PubLayNet tables, which tables do dp-bench uses? Will you share the new results on dp-bench now with other tools including TableFormer+OTSL?

Nov 19 '24 11:11 mllife

@PeterStaar-IBM , any update to this?

Nov 25 '24 05:11 mllife

The choice of the models is done at the Pipeline level. For example, the PDF pipeline (called StandardPdfPipeline) is defined in docling/pipeline/standard_pdf_pipeline.py.

You can make your own pipeline with different models, or simply extend with others. We have an example which extends the PDF pipeline with an image understanding model. See https://ds4sd.github.io/docling/examples/develop_picture_enrichment/* .

Hello developer, I have recently been trying to use PP-StructureV3 to optimize table recognition in order to replace tableformer. I am concerned that tableforme may not be able to handle complex table recognition tasks (even though the text can be directly recognized by the PDF backend). However, this URL is no longer valid. Could you please provide a correct URL?

Aug 07 '25 14:08 cenaia