docling
docling copied to clipboard
feat: Improve the LayoutPostprocessor
LayoutPostprocessor improvements:
- Do not throw away Formulas which have no assigned cells (neither programmatic nor OCR)
- Add some post-processing heuristics that look at the text to descide if a cluster label must be corrected
- (e.g. flip text to caption label when the text starts with "fig" or "table")
- Go through issues labeld with "layout", pick provided inputs, and decide if to make CVAT GT for it
- Disable orphan creation code and check if it improves mAP
- ...
Checklist:
- [x] Documentation has been updated, if necessary.
- [x] Examples have been added, if necessary.
- [x] Tests have been added, if necessary.
Merge Protections
Your pull request matches the following merge protections and will not be merged until they are valid.
🟢 Enforce conventional commit
Wonderful, this rule succeeded.
Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
- [X]
title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
:loudspeaker: Thoughts on this report? Let us know!
synced with @nikos-livathinos , absolete now
Merge Protections
Your pull request matches the following merge protections and will not be merged until they are valid.
🟢 Enforce conventional commit
Wonderful, this rule succeeded.
Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
- [X]
title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?: