unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

chore/ use word level bounding boxes for `add_pytesseract_bbox_to_elements`

Open Coniferish opened this issue 1 year ago • 1 comments

Per discussion here (https://github.com/Unstructured-IO/unstructured/pull/1259/files#r1312235977), add_pytesseract_bbox_to_elements can be improved by using pytesseract.image_to_data and vector math to find the coordinates of elements.

Coniferish avatar Aug 31 '23 22:08 Coniferish

Scheduled for 2024 Q3

orlandounstructured avatar Feb 12 '24 19:02 orlandounstructured