Using Docling with costume layout and table recognition models
Is it possible to use the docling with costume models for layout and table recognition.
I would like to use the pipeline by replacing the existing models with my own models for layout and table recognition. I am wondering if the documentation has any example about using costume AI models.
The choice of the models is done at the Pipeline level. For example, the PDF pipeline (called StandardPdfPipeline) is defined in docling/pipeline/standard_pdf_pipeline.py.
You can make your own pipeline with different models, or simply extend with others. We have an example which extends the PDF pipeline with an image understanding model. See https://ds4sd.github.io/docling/examples/develop_picture_enrichment/.
@ALIYoussef You can of course also provide extension to docling via a PR.
@ALIYoussef We would be excited to see alternative layout or table structure models implementations from the community. The example above posted by @dolfim-ibm is a good way to understand the basic principle of how to add a model. If you want to replicate a new layout model or table model, the other good starting point is the actual implementations of the default models. The code is very readable, see here and here.
I appreciate your support. I will give it a try and keep you posted!
Super, looking forward what you cook up. A few pointers:
- Look into the table 1 of DocLayNet
- Look into the dp-bench
@PeterStaar-IBM , have you benchmarked docling_v2 on the dp-bench? How good it is on tables?
the team is doing the evals in the next weeks
@PeterStaar-IBM , I found this - https://huggingface.co/ds4sd/docling-models#tableformer
is this the same?
| Model (TEDS) | Simple table | Complex table | All tables |
|---|---|---|---|
| Tabula | 78.0 | 57.8 | 67.9 |
| Traprange | 60.8 | 49.9 | 55.4 |
| Camelot | 80.0 | 66.0 | 73.0 |
| Acrobat Pro | 68.9 | 61.8 | 65.3 |
| EDD | 91.2 | 85.4 | 88.3 |
| TableFormer | 95.4 | 90.1 | 93.6 |
yes, same model but we have some updated weights, so I expect the numbers with the current weights to be better actually.
closing for now!
Hello, @PeterStaar-IBM , I see the the OTSL paper have some numbers (95.5) but they are on PubLayNet tables, which tables do dp-bench uses? Will you share the new results on dp-bench now with other tools including TableFormer+OTSL?
@PeterStaar-IBM , any update to this?
The choice of the models is done at the Pipeline level. For example, the PDF pipeline (called
StandardPdfPipeline) is defined in docling/pipeline/standard_pdf_pipeline.py.You can make your own pipeline with different models, or simply extend with others. We have an example which extends the PDF pipeline with an image understanding model. See https://ds4sd.github.io/docling/examples/develop_picture_enrichment/* .
Hello developer, I have recently been trying to use PP-StructureV3 to optimize table recognition in order to replace tableformer. I am concerned that tableforme may not be able to handle complex table recognition tasks (even though the text can be directly recognized by the PDF backend). However, this URL is no longer valid. Could you please provide a correct URL?