unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

bug/Vertical text in pdf inverted

Open CorentinvdBdO opened this issue 2 years ago • 1 comments

Describe the bug In a pdf, a line of vertical text is oriented text bottom facing page right. When doing "partition(filename)", the text is inverted and has random spaces added ("é fi i t r e c" instead of "certifié"). The rest of the text in the same page is properly treated. This is not an OCR.

To Reproduce Use a pdf with vertical text.

Expected behavior Proper order conserved and no random spaces added. Pdfium or pypdf do it properly.

Screenshots non applicable

Environment Info non applicable.

Additional context non applicable

CorentinvdBdO avatar Oct 05 '23 13:10 CorentinvdBdO

Hi @CorentinvdBdO - if this is still an issue, could you provide an example document we could use to reproduce?

MthwRobinson avatar May 15 '24 14:05 MthwRobinson

Hello @MthwRobinson , sorry for bumping and old issue but I'm having the same problem, and I can provide an example document: https://ddd.uab.cat/record/118259?ln=es trecom_a2006m12n21p7.pdf

Susensio avatar Jul 17 '24 14:07 Susensio