docling icon indicating copy to clipboard operation
docling copied to clipboard

Using Docling with costume layout and table recognition models

Open ALIYoussef opened this issue 1 year ago • 5 comments

Is it possible to use the docling with costume models for layout and table recognition.

I would like to use the pipeline by replacing the existing models with my own models for layout and table recognition. I am wondering if the documentation has any example about using costume AI models.

ALIYoussef avatar Nov 05 '24 19:11 ALIYoussef

The choice of the models is done at the Pipeline level. For example, the PDF pipeline (called StandardPdfPipeline) is defined in docling/pipeline/standard_pdf_pipeline.py.

You can make your own pipeline with different models, or simply extend with others. We have an example which extends the PDF pipeline with an image understanding model. See https://ds4sd.github.io/docling/examples/develop_picture_enrichment/.

dolfim-ibm avatar Nov 06 '24 07:11 dolfim-ibm

@ALIYoussef You can of course also provide extension to docling via a PR.

PeterStaar-IBM avatar Nov 06 '24 09:11 PeterStaar-IBM

@ALIYoussef We would be excited to see alternative layout or table structure models implementations from the community. The example above posted by @dolfim-ibm is a good way to understand the basic principle of how to add a model. If you want to replicate a new layout model or table model, the other good starting point is the actual implementations of the default models. The code is very readable, see here and here.

cau-git avatar Nov 06 '24 09:11 cau-git

I appreciate your support. I will give it a try and keep you posted!

ALIYoussef avatar Nov 06 '24 15:11 ALIYoussef

Super, looking forward what you cook up. A few pointers:

  1. Look into the table 1 of DocLayNet
Screenshot 2024-11-06 at 16 47 01
  1. Look into the dp-bench
Screenshot 2024-11-06 at 16 47 18

PeterStaar-IBM avatar Nov 06 '24 15:11 PeterStaar-IBM

@PeterStaar-IBM , have you benchmarked docling_v2 on the dp-bench? How good it is on tables?

mllife avatar Nov 18 '24 05:11 mllife

the team is doing the evals in the next weeks

PeterStaar-IBM avatar Nov 18 '24 06:11 PeterStaar-IBM

@PeterStaar-IBM , I found this - https://huggingface.co/ds4sd/docling-models#tableformer

is this the same?

Model (TEDS) Simple table Complex table All tables
Tabula 78.0 57.8 67.9
Traprange 60.8 49.9 55.4
Camelot 80.0 66.0 73.0
Acrobat Pro 68.9 61.8 65.3
EDD 91.2 85.4 88.3
TableFormer 95.4 90.1 93.6

mllife avatar Nov 18 '24 07:11 mllife

yes, same model but we have some updated weights, so I expect the numbers with the current weights to be better actually.

PeterStaar-IBM avatar Nov 18 '24 07:11 PeterStaar-IBM

closing for now!

PeterStaar-IBM avatar Nov 18 '24 08:11 PeterStaar-IBM

Hello, @PeterStaar-IBM , I see the the OTSL paper have some numbers (95.5) but they are on PubLayNet tables, which tables do dp-bench uses? Will you share the new results on dp-bench now with other tools including TableFormer+OTSL?

mllife avatar Nov 19 '24 11:11 mllife

@PeterStaar-IBM , any update to this?

mllife avatar Nov 25 '24 05:11 mllife

The choice of the models is done at the Pipeline level. For example, the PDF pipeline (called StandardPdfPipeline) is defined in docling/pipeline/standard_pdf_pipeline.py.

You can make your own pipeline with different models, or simply extend with others. We have an example which extends the PDF pipeline with an image understanding model. See https://ds4sd.github.io/docling/examples/develop_picture_enrichment/* .

Hello developer, I have recently been trying to use PP-StructureV3 to optimize table recognition in order to replace tableformer. I am concerned that tableforme may not be able to handle complex table recognition tasks (even though the text can be directly recognized by the PDF backend). However, this URL is no longer valid. Could you please provide a correct URL?

cenaia avatar Aug 07 '25 14:08 cenaia