pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

After using updatePageFormFieldValues PyPDF2 cannot read fields with getFormTextFields

Open dossjh opened this issue 5 years ago • 2 comments

After I update a pages fields they cannot be read in PyPDF2. I am using the needs appearances trick to make them visible in my pdf viewer (pdf-xchange).

If I open the files with pdf-xchange and close them I can again read the fields with PyPDF2

I noticed the document info of the updated files does not contain the /fields section like so:

original document: {'/ModDate': "D:20180708222539-06'00'", '/Producer': 'PyPDF2', '/Fields': [IndirectObject(3, 0), IndirectObject(4, 0), IndirectObject(5, 0)]}

updated fields: {'/NeedAppearances': <PyPDF2.generic.BooleanObject object at 0x0000026DB2265470>, '/Producer': 'PyPDF2'}

I am not sure how to add the fields section back

Thanks

dossjh avatar Jul 09 '18 04:07 dossjh

This is related to if not the same as issue #355.

mwhit74 avatar Aug 07 '18 19:08 mwhit74

Thank you for sharing that observation. I've created an example to confirm it:

from PyPDF2 import PdfReader, PdfWriter

reader = PdfReader("resources/form.pdf")
writer = PdfWriter()
writer.add_page(reader.pages[0])
writer.write("forms-after-writing.pdf")

Within resources/form.pdf, we have /AcroForm 22 0 R within the Catalog:

34 0 obj
<<
  /Type /Catalog
  /Pages 21 0 R
  /Names 33 0 R
  /PageMode /UseNone
  /AcroForm 22 0 R
  /OpenAction 1 0 R
>>
endobj

That object is a field dictionary looking like this:

22 0 obj
<<
  /Fields [ 15 0 R ]
  /DR <<
    /Font <<
      /ZaDb 5 0 R
      /Helv 6 0 R
    >>
  >>
  /DA (/Helv 10 Tf 0 g)
  /NeedAppearances true
>>
endobj

After executing the script above, the Catalog looks like this:

4 0 obj
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj

MartinThoma avatar Aug 06 '22 11:08 MartinThoma

@MartinThoma was correct : in order to copy the /AcroForm tree, you have to use append()

we can close this issue

pubpub-zz avatar May 28 '23 14:05 pubpub-zz