pdf2image icon indicating copy to clipboard operation
pdf2image copied to clipboard

convert_from_path not fetching greek letter

Open bvilmann opened this issue 4 years ago • 3 comments

I am experiencing that convert_from_path does not catch everything within the pdf, like greek letters:

This is the PDF file (I highlighted the disappearing letter): image

This is the image returned from convert_from_path: image

Any good ideas on why this is?

bvilmann avatar Aug 15 '21 13:08 bvilmann

Hi! Sorry I am late here. Could you test if pdftoppm -r 300 -jpeg your_file.pdf out renders properly? If not it's an issue with the underlying library and unfortunately I don't have much control over that.

That being said, it looks like a font issue so if anything you should see some output when running pdftoppm that might point in the right direction.

Belval avatar Aug 31 '21 12:08 Belval

Hi Belval,

Sorry also for the long delay. I guess it is out of your hands then. It returns:

Syntax Error: No display font for 'ArialUnicode'

If you have no further suggestions you can just close this issue - thanks for your reply :)

Update: I looked up the ArialUnicode and see that there is references to greek standards. What library are you using for font purposes?

bvilmann avatar Oct 05 '21 21:10 bvilmann

That's dependent on your OS, fortunately the fonts are all shared so installing a LaTeX font package would probably fix your issue sudo apt install texlive-fonts-extra. For Arial the equivalent is DejaVuSans that you can get with sudo apt install ttf-dejavu iirc.

Belval avatar Oct 07 '21 19:10 Belval