doctr
doctr copied to clipboard
Vertical block separation
🚀 The feature
Currently, builder.py has a paragraph_break parameter for merging sub_lines that are relatively close enough.
I would appreciate a similar parameter for merging stacked lines that are vertically close enough.
Motivation, pitch
Currently, when I run docTR on the above image and images with similar lower thirds, I get the following from result.render() with the \n\n representing separating different blocks. I would like to be able to direct the builder to merge lines that are this close into one block containing two lines rather than getting two blocks that contain one line each.
REP. PAUL LEONARD\n\nD-DAYTON
here is the document object:
Document(
(pages): [Page(
dimensions=(360, 480)
(blocks): [
Block(
(lines): [Line(
(words): [
Word(value='REP.', confidence=0.99),
Word(value='PAUL', confidence=1.0),
Word(value='LEONARD', confidence=1.0),
]
)]
(artefacts): []
),
Block(
(lines): [Line(
(words): [Word(value='D-DAYTON', confidence=0.99)]
)]
(artefacts): []
),
]
)]
)
Alternatives
No response
Additional context
No response