pdfminer.six
pdfminer.six copied to clipboard
pdfminer can't extract text from some pdffiles but pypdf can?
Feature request
Thanks for your suggestion on improving pdfminer.six. To helps us discuss and implement this request, please make sure to include the following information:
- There are a few types of pdf files which contain very detailed information and are in different styles.
- These pdf_files contain images but text can be extracted without OCR. That's why pypdf can extract information from those pdf_files.
Could you provide these pdf files here? also did those pdfs had only images and no text..? If so, then how did you imply that OCR was not used and still text got extracted?
Thanks for your response. I told you pypdf extracted text from those files, these files contain images+text. Task is to extract text not mages. I can't provide those files here but will be very happy to share in mail. You can send email here
I have sent an email, kindly share your files there
I didn't get your email id. Can you send again please at this email id? [email protected]
I have sent the reply again on the mailid mentioned above. Please check in Spam/Junk folder of your inbox as well.