docling icon indicating copy to clipboard operation
docling copied to clipboard

feat: Improve the LayoutPostprocessor

Open nikos-livathinos opened this issue 9 months ago • 1 comments

LayoutPostprocessor improvements:

  • Do not throw away Formulas which have no assigned cells (neither programmatic nor OCR)
  • Add some post-processing heuristics that look at the text to descide if a cluster label must be corrected
    • (e.g. flip text to caption label when the text starts with "fig" or "table")
  • Go through issues labeld with "layout", pick provided inputs, and decide if to make CVAT GT for it
  • Disable orphan creation code and check if it improves mAP
  • ...

Checklist:

  • [x] Documentation has been updated, if necessary.
  • [x] Examples have been added, if necessary.
  • [x] Tests have been added, if necessary.

nikos-livathinos avatar Feb 12 '25 09:02 nikos-livathinos

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • [X] title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

mergify[bot] avatar Feb 12 '25 09:02 mergify[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

:loudspeaker: Thoughts on this report? Let us know!

codecov[bot] avatar Apr 15 '25 11:04 codecov[bot]

synced with @nikos-livathinos , absolete now

PeterStaar-IBM avatar Jun 02 '25 07:06 PeterStaar-IBM

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • [X] title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

mergify[bot] avatar Jun 02 '25 07:06 mergify[bot]