pdf-reader icon indicating copy to clipboard operation
pdf-reader copied to clipboard

crop text in 'Tj' PagesStrategy::OPERATORS

Open msk-yv opened this issue 4 years ago • 1 comments

What I see in pdf image Text what I see when call page.text

image

However, in page.raw_content I can see all date text image

Can I be sure it just date format croping? Or it some system problem and when in that place would '22.12.2019' I`ll get '22.12.20' instead '22.12.19' ?

msk-yv avatar Dec 04 '21 17:12 msk-yv

This is likely to be the fault of the primitive algorithm in PageLayout. I'd love to find time to improve it!

The algorithm sometimes results in characters that will overlap, in which case some characters will be left out.

yob avatar Dec 12 '21 11:12 yob