tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Failed to load PDF file

Open sumitm815 opened this issue 5 years ago • 5 comments

I am using tesseract 3.3 for extract text from IMG/PDF. It will work fine when i am extract text from image, but when i am use pdf file for extract text it will throw error.

Pix img = Pix.LoadFromFile(pdfFilePath);

Please help.

sumitm815 avatar Jun 14 '19 11:06 sumitm815

Loading from PDF isn't supported. You'll need to use a different library to load it and then extract the images.

On Fri., 14 Jun. 2019, 21:54 sumitm815, [email protected] wrote:

I am using tesseract 3.3 for extract text from IMG/PDF. It will work fine when i am extract text from image, but when i am use pdf file for extract text it will throw error.

Pix img = Pix.LoadFromFile(pdfFilePath);

Please help.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/charlesw/tesseract/issues/475?email_source=notifications&email_token=AAB7HSBFOWG5J2MBIX2ASZLP2OBHDA5CNFSM4HYHEW4KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GZRGKCQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB7HSDJQLIVLTEX55PSAWDP2OBHDANCNFSM4HYHEW4A .

charlesw avatar Jun 15 '19 00:06 charlesw

Thanks..! Could you please suggest any library for extract PDF to image, then i can use to them.

sumitm815 avatar Jun 17 '19 10:06 sumitm815

Itextsharp or abcPDF.

Sent from my iPad

On Jun 17, 2019, at 5:09 AM, sumitm815 <[email protected]mailto:[email protected]> wrote:

[External email: Use caution! Do not open attachments or click on links from unknown senders or unexpected emails.]

Thanks..! Could you please suggest any library for extract PDF to image, then i can use to them.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://protect-us.mimecast.com/s/1Sn_CG6gVxs1kx9lS7z0D5?domain=github.com, or mute the threadhttps://protect-us.mimecast.com/s/ussGCJ6l9AsqZGozfzADfs?domain=github.com.

tdhintz avatar Jun 17 '19 10:06 tdhintz

I use Docnet.Core https://www.nuget.org/packages/Docnet.Core/

ferronsw avatar Jun 20 '19 06:06 ferronsw

you can also use Ghostscript Wrapper or muPDF SDK and convert to tiff. Here is the working example in VietOCR source code

harshgsx avatar Jun 27 '19 11:06 harshgsx