docling icon indicating copy to clipboard operation
docling copied to clipboard

Get your documents ready for gen AI

Results 434 docling issues
Sort by recently updated
recently updated
newest added

…t target encoding doesn't have all characters Specify encoding when writing output file to avoid errors when default target encoding doesn't have all characters. utf8 seems like the most universal...

Check out airesearch.js.org for a lot of similar pdf and html to markdown tools. It'd be great for a serverless and frontend js version so I can integrate the algorithms

I tried the example from Installation (Alternative OCR Engines) and got `cannot import name 'PipelineOptions' from 'docling.datamodel.base_models'` Then I tried https://ds4sd.github.io/docling/examples/custom_convert/ (PyPdfium with EasyOCR) and got `ValueError: "PipelineOptions" object has...

I tried extracting data from a PDF containing the image below. ![image](https://github.com/user-attachments/assets/7d8b668a-d8b3-4c08-80b5-f77e7f93ad7c) However, the result was ![image](https://github.com/user-attachments/assets/42d0be1a-83da-4b9e-a458-069bd871b4a5) The output was not accurate, especially for the basic mathematical expressions.

It would be nice to have an option to export markdown files with images as references to files instead of embedding them in the document as base 64. This might...

I ran DocumentConverter on the PDF file featured in the example (https://arxiv.org/pdf/2408.09869) with `TableFormerMode.ACCURATE` and (`do_cell_matching = False` or `do_cell_matching = True`). Results of Table 1(`item.export_to_html()`) : Does this reflect...

Arxiv provides static html version of most papers using LateXML. The html contents are well structured by rich ltx_xxxx CSS classnames. It should be lightning fast parsing those paper htmls...

`Docling version: 2.3.1 Docling Core version: 2.3.1 Docling IBM Models version: 2.0.3 Docling Parse version: 2.0.2 ` Fine software: I have compared it with many alternatives. (BTW, for some newish...

bug
needs investigation

Hello, I’m encountering an issue when extracting tables containing merged rows. Specifically, when a cell spans multiple rows, the expected behavior is to assign it a `row_span` value greater than...

bug
table structure

We need to add parameter to enable or disable empty column / row removal from Table Model post processing. Details in the discussion: https://github.com/DS4SD/docling/discussions/201 See if there are some other...

enhancement
priority:medium
table structure