pdf2htmlEX icon indicating copy to clipboard operation
pdf2htmlEX copied to clipboard

Correct text visibility renders text as a part of the image

Open rmit-s3578933-shiran-ekanayake opened this issue 8 years ago • 3 comments

--correct-text-visibility flag sometimes render text as part of closer images. This makes those text blurry.

Sample pdf is attached here. Consider the 5th page.

https://www.dropbox.com/s/w5q124rn4zqhx3f/1.pdf?dl=0

Is there a possible solution to overcome this issue?

screen shot 2016-06-16 at 6 08 30 pm

Please attach the PDF directly.

This may be due to that those characters' bounding boxes overlap slightly, so --correct-text-visibility thinks them should be renderred as image to achieve correct visibility. Maybe I should consider slightly overlapping characters not overlapping. However, this won't solve all bluring problems.

--bg-format svg should avoid the text bluring problems, however the renderring difference between html and svg is usually still noticable.

duanyao avatar Jun 16 '16 11:06 duanyao

1.pdf Here I have attached the PDF directly. --bg-format svg partially fix the problem. And it causes text to fall on top of another when it comes near to the image.

Hi,

First, thanks for the great job with this lib !

I do have the same issue with overlapping text rendered as image, even with svg format the text difference is quite visible. Is there a fix coming ?

Regards

eherve avatar Apr 10 '17 13:04 eherve