pdf2htmlEX
pdf2htmlEX copied to clipboard
Correct text visibility renders text as a part of the image
--correct-text-visibility flag sometimes render text as part of closer images. This makes those text blurry.
Sample pdf is attached here. Consider the 5th page.
https://www.dropbox.com/s/w5q124rn4zqhx3f/1.pdf?dl=0
Is there a possible solution to overcome this issue?
data:image/s3,"s3://crabby-images/7ba36/7ba36e7a0aa36d71f24b5064a9aa6975f6a0cd85" alt="screen shot 2016-06-16 at 6 08 30 pm"
Please attach the PDF directly.
This may be due to that those characters' bounding boxes overlap slightly, so --correct-text-visibility
thinks them should be renderred as image to achieve correct visibility. Maybe I should consider slightly overlapping characters not overlapping. However, this won't solve all bluring problems.
--bg-format svg
should avoid the text bluring problems, however the renderring difference between html and svg is usually still noticable.
1.pdf
Here I have attached the PDF directly. --bg-format svg
partially fix the problem. And it causes text to fall on top of another when it comes near to the image.
Hi,
First, thanks for the great job with this lib !
I do have the same issue with overlapping text rendered as image, even with svg format the text difference is quite visible. Is there a fix coming ?
Regards