docling icon indicating copy to clipboard operation
docling copied to clipboard

fix: ParserError EOF inside string (#470)

Open guglie opened this issue 1 year ago • 1 comments

Do not interpret quotes at the start of text read by tesseract as TSV cell quoting otherwise an error is raised if the tesseract TSV output contains rows like this:

5	1	45	1	24	1	1557	1119	104	43	79.578239	"Example
5	1	45	1	24	2	1675	1119	76	43	93.807220	rows”

Issue resolved by this Pull Request: Resolves #470

Checklist:

  • [X] Documentation has been updated, if necessary.
  • [X] Examples have been added, if necessary.
  • [X] Tests have been added, if necessary.

guglie avatar Nov 29 '24 16:11 guglie

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • [X] title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?:

mergify[bot] avatar Nov 29 '24 16:11 mergify[bot]

@nikos-livathinos Can you quickly review: I would like your approval before we merge this.

PeterStaar-IBM avatar Dec 02 '24 07:12 PeterStaar-IBM