tesseract
tesseract copied to clipboard
Failed to load PDF file
I am using tesseract 3.3 for extract text from IMG/PDF. It will work fine when i am extract text from image, but when i am use pdf file for extract text it will throw error.
Pix img = Pix.LoadFromFile(pdfFilePath);
Please help.
Loading from PDF isn't supported. You'll need to use a different library to load it and then extract the images.
On Fri., 14 Jun. 2019, 21:54 sumitm815, [email protected] wrote:
I am using tesseract 3.3 for extract text from IMG/PDF. It will work fine when i am extract text from image, but when i am use pdf file for extract text it will throw error.
Pix img = Pix.LoadFromFile(pdfFilePath);
Please help.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/charlesw/tesseract/issues/475?email_source=notifications&email_token=AAB7HSBFOWG5J2MBIX2ASZLP2OBHDA5CNFSM4HYHEW4KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GZRGKCQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB7HSDJQLIVLTEX55PSAWDP2OBHDANCNFSM4HYHEW4A .
Thanks..! Could you please suggest any library for extract PDF to image, then i can use to them.
Itextsharp or abcPDF.
Sent from my iPad
On Jun 17, 2019, at 5:09 AM, sumitm815 <[email protected]mailto:[email protected]> wrote:
[External email: Use caution! Do not open attachments or click on links from unknown senders or unexpected emails.]
Thanks..! Could you please suggest any library for extract PDF to image, then i can use to them.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://protect-us.mimecast.com/s/1Sn_CG6gVxs1kx9lS7z0D5?domain=github.com, or mute the threadhttps://protect-us.mimecast.com/s/ussGCJ6l9AsqZGozfzADfs?domain=github.com.
I use Docnet.Core https://www.nuget.org/packages/Docnet.Core/
you can also use Ghostscript Wrapper or muPDF SDK and convert to tiff. Here is the working example in VietOCR source code