Arabic Letters are ???
Hello I try to extract table from PDF that contains Arabic latter but when I extract the table I get ??? for all Arabic letter this issue happens only when use the jar when I use the web app I get the data without any issue
how I can solve this issue ??
@yovrer Open your document in Acrobat Reader and the press Command + D on OSX (or Control + D on Windows I believe). This should bring up Document Properties dialog. Under the Fonts tab, do you see something like:
Type: TrueType (CID)
Encoding: Identity-H
Thanks @rayleeriver for replaying. Yes I see as you said in the font tab
What that should mean?
CID/Identity-H fonts makes it impossible to parse. See Adobe's own answer https://community.adobe.com/t5/acrobat/font-encoding-settings-removing-identity-h-encoding/td-p/10605220?page=1
I also tried the Acrobat Pro DC's preflight trick with no success. I was lucky enough that we were able to change the "Font" selection from our Vendor's tool to a different one that's NOT a CID font. After that, we were able to extract Table data via Tabula.
But why the web app can extract the arabic word from pdf without any issue ??? And the issue happen only when I use the jar !!?