unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

bug/PDF elements out of order

Open ron-unstructured opened this issue 1 year ago • 5 comments

Describe the bug There is a discrepancy in the element order when partitioning a PDF. From the screenshots, the blue and red circles intended to highlight text are switched in position in the output image, compared to their correct placement in the original PDF.

To Reproduce Run PDF partition using Python SDK with auto, fast, and hi_res strategy.

Expected behavior The expected behavior is that the element order in the output image should match the placement and color coding (blue and red circles) as they are in the original PDF document.

Screenshots PDF partition

Environment Info OS version: macOS-14.2.1-arm64-arm-64bit Python version: 3.10.12 unstructured version: 0.12.1.dev11 unstructured-inference version: 0.7.18 pytesseract version: 0.3.10 Torch version: 2.1.1 Detectron2 is not installed PaddleOCR is not installed Libmagic version: ==> libmagic: stable 5.45 (bottled) LibreOffice version: ==> libreoffice: 7.6.4

Additional context similar issue: https://github.com/Unstructured-IO/unstructured/issues/2208

ron-unstructured avatar Jan 24 '24 00:01 ron-unstructured