docling icon indicating copy to clipboard operation
docling copied to clipboard

Not All Headers Are Identified during PDF to MD conversion

Open sallahbaksh opened this issue 10 months ago • 0 comments

Bug

When converting from PDF to Markdown, not all headers are being identified. For example, for the attached PDF, the headers Sec. 28-4. - Applicability. and Sec. 28-5. - Repeal. are identified correctly, but Sec. 28-1. - Short title. and Sec. 28-2. - Purpose., etc. are not identified as headers.

Steps to reproduce

Use DocumentConverter() to convert pdf to md (Stafford County - VA Zoning Ordinance.pdf). Observe that Sec. 28-1. - Short title., Sec. 28-2. - Purpose., Sec. 28-6. - Conflict of provisions., Sec. 28-7. - Severability. are not identified as headers, while Sec. 28-4. - Applicability. and Sec. 28-5. - Repeal. are.

Image

Docling version

Docling version: 2.15 Docling Core version: 2.17.1 Docling IBM Models version: 3.3.0 Docling Parse version: 3.2.0 Python: cpython-312 (3.12.8) Platform: Windows-11-10.0.22631-SP0 ...

Python version

Python 3.12.8

sallahbaksh avatar Feb 04 '25 19:02 sallahbaksh