pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

Can not see Form PDF data

Open shubhpy opened this issue 5 years ago • 1 comments

Hey,

https://s3.ap-south-1.amazonaws.com/public-sharing-purpose/aoc_gooded_tech.pdf

I have been trying since a long time to read the data inside this PDF with any python library. Somehow I landed here.

I tried opening this PDF but can not see the data.

History: So when I open this PDF in Adobe Acrobat Reader, I see security warning of if you trust ... site and it asks to either allow or block. When I allow I can see the data. But when I open it in Preview(Mac), It doesn't ask anything and shows data.

When I passed this PDF to any Python Library (PyPDF2, tabula, PyMuPDF, pdfminer), I didn't see the data but only fields.

One day, I opened it in Preview(Mac) and exported as PDF. And Voila It was working and libraries were able to read full data.

Can anyone help me how can I see the full data in python? Thanks

shubhpy avatar Jan 21 '19 21:01 shubhpy

I'm having the same issue. i was able to get around by opening the PDF with chrome then print as PDF. this is the form im having problem with

https://www.canada.ca/content/dam/ircc/migration/ircc/english/passport/forms/pdf/pptc153.pdf

chiencarlos11 avatar Feb 20 '20 17:02 chiencarlos11

@shubhpy your pdf is no more available => no analysis is possible @chiencarlos11 : the pdf is an XFA form (https://en.wikipedia.org/wiki/XFA) the form by its how is like a web page that is dynamically displayed in the viewer. The data can be extracted with get_fields()/get_form_text_fields()

This issue can be closed

pubpub-zz avatar May 28 '23 14:05 pubpub-zz

I close this issue feel free to provide updates to reopen it

pubpub-zz avatar Jun 25 '23 12:06 pubpub-zz