docling
docling copied to clipboard
docling_parse_v2 split/connect words
Bug
There are cases that docling_parser_v2 spilt words to it characters or connect words
Example1: Original text: products that were recently iroduced markdown: products that were re c e n t l y i roduced
Example2: Original text: Tables 2–5 show the results of partitioning the graphs in our test suite on markdown: Tables 2-5 sho w theresultsfpartitioningegraphsinourtest suite on
Steps to reproduce
...
Docling version
Docling version: 2.21.0 Docling Core version: 2.18.0 Docling IBM Models version: 3.3.0 Docling Parse version: 3.3.0 Python: cpython-311 (3.11.4) Platform: macOS-14.6.1-arm64-arm-64bit
Python version
Python 3.11.4