pypdf
pypdf copied to clipboard
`update_page_form_field_values` fails on pdf with same field on multiple pages.
I have this pdf file with some fields duplicated on multiple pages. When I try to fill any of those fields (for example, "n et p" using update_page_form_field_values
, it fails with KeyError: '/AP'
.
My wild guess is that it is because update_page_form_field_values
takes one page to update while the same field is duplicated multiple times over the whole document.
Side note: pdftk handles this well, but I'm looking for a native Python solution.
Environment
$ python -m platform
Windows-10-10.0.22631-SP0
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.16.2, crypt_provider=('cryptography', '41.0.4'), PIL=none
Code + PDF
This is a minimal, complete example that shows the issue:
from pypdf import PdfReader, PdfWriter
reader = PdfReader("634_empty.pdf")
writer = PdfWriter()
# Fill the PDF
writer.append(reader)
fields = reader.get_fields()
page_1 = {
"n et p": "test",
}
writer.update_page_form_field_values(writer.pages[1], page_1)
with open("test_output.pdf", "wb") as output_stream:
writer.write(output_stream)
I'm sharing the pdf file that causes the issue, but I'm not the author, so I don't think it can be included in tests.
Traceback
This is the complete (redacted) Traceback I see:
Traceback (most recent call last):
File "C:\..\dap_form.py", line 83, in validate_data
fill_dap_pdf(v, "test_output.pdf")
File "C:\..\dap_generate.py", line 45, in fill_dap_pdf
writer.update_page_form_field_values(writer.pages[1], page_1)
File "C:\..\venv\Lib\site-packages\pypdf\_writer.py", line 1072, in update_page_form_field_values
value if value in k[AA.AP]["/N"] else "/Off"
~^^^^^^^
File "C:\..\venv\Lib\site-packages\pypdf\generic\_data_structures.py", line 320, in __getitem__
return dict.__getitem__(self, key).get_object()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '/AP'
your form uses a special field which is synchronized between multiple pages. Thanks for the example. However it will need a little of time to find a fix
I just ran into a similar issue on a single page PDF where the same field is repeated.
This try/catch fixes it, but it's probably not the right way to go about it.
https://github.com/py-pdf/pypdf/pull/2333
A hacky way to make it work is going into adobe acrobat and renaming the fields to include a suffix.
For example mine wasn't working when the field was called USER_NAME but when I changed it to USER_NAME_1 it started working and I wasn't even using the field in multiple pages but I was having the same issue.
A hacky way to make it work is going into adobe acrobat and renaming the fields to include a suffix.
For example mine wasn't working when the field was called USER_NAME but when I changed it to USER_NAME_1 it started working and I wasn't even using the field in multiple pages but I was having the same issue.
As said having Annotation/Widget, refering to the same field is a is normal : it allows to report the filled data on multiple pages. The solution you are proposing consists in building new fields with new names.
The pdf inhere has not been built properly as it has duplicated fields with same names and not attaching them properly. The extra forms should have been first prepared adding a grouping field as stated in the documentation: https://pypdf.readthedocs.io/en/stable/user/merging-pdfs.html#merging-forms
@antonio-cinnamon / @oeble have a try
Thx. I'll look into this.
A hacky way to make it work is going into adobe acrobat and renaming the fields to include a suffix. For example mine wasn't working when the field was called USER_NAME but when I changed it to USER_NAME_1 it started working and I wasn't even using the field in multiple pages but I was having the same issue.
As said having Annotation/Widget, refering to the same field is a is normal : it allows to report the filled data on multiple pages. The solution you are proposing consists in building new fields with new names.
The pdf inhere has not been built properly as it has duplicated fields with same names and not attaching them properly. The extra forms should have been first prepared adding a grouping field as stated in the documentation: https://pypdf.readthedocs.io/en/stable/user/merging-pdfs.html#merging-forms
@antonio-cinnamon / @oeble have a try
Can you elaborate on the usage of this or provide an example? I have played around with reader.add_form_topname("form1")
, but have yet to be able to use it to solve this issue without discovering more.
Below is my specific usage thus far:
from pypdf import PdfReader, PdfWriter
myFiles = {
"test1": {
"name": "Test1 Form",
"path": "test1.pdf",
"usage": {
"fields": {
"First Name": "Reed",
"Middle Name": "R",
"MM": "04",
"DD": "21",
"YY": "24",
"Initial": "RRG",
# "I DO NOT Agree": null,
# "Last Name": null
},
}
},
"test2": {
"name": "Test2 Form",
"path": "test2.pdf",
"usage": {
"fields": {
"p2 First Name": "Joe",
"p2 Middle Name": "S",
"p2 MM": "03",
"p2 DD": "31",
"p2 YY": "24",
"Initial": "JSS",
# "p2 I DO NOT Agree": "null",
"p2 Last Name": "Smith",
"p3 First Name": "John",
"p3 Middle Name": "R",
"p3 MM": "01",
"p3 DD": "25",
"p3 YY": "21"
},
}
}
}
pdfOut = "merged.pdf"
merger = PdfWriter()
for file in myFiles:
reader = PdfReader(myFiles[file]["path"])
reader.add_form_topname(file)
writer = PdfWriter()
writer.append(reader)
# Update form fields for each page in the current PDF
for page in range(len(reader.pages)):
writer.update_page_form_field_values(
writer.pages[page],
myFiles[file]["usage"]["fields"]
)
# Append the pages directly to the final_writer
for page in writer.pages:
merger.add_page(page)
# Write the merged PDF to the output file
with open(pdfOut, "wb") as f:
merger.write(f)
In this, I am iterating through a dictionary of documents, filling these required documents, and then merging all of the required documents I have. I get this result because I am unfamiliar with how to use the aforementioned function add_form_topname
:
Traceback (most recent call last):
File "C:\Users\range\CodingProjects\RGBZ\Aeri4l\AllofPermitFly\PermitFlyHelper\functions\functions\standalone.py", line 54, in <module>
writer.update_page_form_field_values(
File "C:\Users\range\CodingProjects\RGBZ\Aeri4l\AllofPermitFly\PermitFlyHelper\functions\functions\venv\lib\site-packages\pypdf\_writer.py", line 955, in update_page_form_field_values
value if value in k[AA.AP]["/N"] else "/Off"
File "C:\Users\range\CodingProjects\RGBZ\Aeri4l\AllofPermitFly\PermitFlyHelper\functions\functions\venv\lib\site-packages\pypdf\generic\_data_structures.py", line 319, in __getitem__
return dict.__getitem__(self, key).get_object()
KeyError: '/AP'
@ReedGraff
Can you complete your code in order to be fully self carrying (merger is never declared)
also remember that working with forms does not allow to work with add_page()
; you have to copy/duplicate both pages but also /AcroForm
section ; in order to do that you need to use append (and possibly using pages parameters to define a partial set of pages
can you also provide your output result
@ReedGraff Can you complete your code in order to be fully self carrying (merger is never declared) also remember that working with forms does not allow to work with
add_page()
; you have to copy/duplicate both pages but also/AcroForm
section ; in order to do that you need to use append (and possibly using pages parameters to define a partial set of pages can you also provide your output result
I have updated the previous message, Happy Easter!
this is possible with another library, which is my solution at the moment:
from pdfrw import PdfReader, PdfDict, PdfName, PdfObject, PdfWriter
ANNOT_KEY = '/Annots'
ANNOT_FIELD_KEY = '/T'
ANNOT_VAL_KEY = '/V'
ANNOT_RECT_KEY = '/Rect'
SUBTYPE_KEY = '/Subtype'
WIDGET_SUBTYPE_KEY = '/Widget'
# ....
pdfOut = "/tmp/merged.pdf"
fields = ()
writer = PdfWriter()
for file in my_instance._uniformRequirements:
pages = PdfReader("storage/" + my_instance._uniformRequirements[file]["path"]).pages
for page in pages:
annotations = page["/Annots"]
for annotation in annotations:
if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
if annotation[ANNOT_FIELD_KEY]:
key = annotation[ANNOT_FIELD_KEY][1:-1]
# annotation.update(PdfDict(T='CHANGED ' + key))
if key in my_instance._uniformRequirements[file]["usage"]["fields"]:
if key in fields:
key = "_" + key
annotation.update(PdfDict(T=key))
annotation.update(PdfDict(V=my_instance._uniformRequirements[file]["usage"]["fields"][key.lstrip('_')]))
annotation.update(PdfDict(AP=''))
# print(key)
fields += (key,)
writer.addpages(pages)
writer.write(pdfOut)
I've prepared a PR to fix this issue. I've also reviewed/improved the test code:
from pypdf import PdfReader, PdfWriter
myFiles = {
"test1": {
"name": "Test1 Form",
"path": "test1.pdf",
"usage": {
"fields": {
"First Name": "Reed",
"Middle Name": "R",
"MM": "04",
"DD": "21",
"YY": "24",
"Initial": "RRG",
# "I DO NOT Agree": null,
# "Last Name": null
},
}
},
"test2": {
"name": "Test2 Form",
"path": "test2-1.pdf",
"usage": {
"fields": {
"p2 First Name": "Joe",
"p2 Middle Name": "S",
"p2 MM": "03",
"p2 DD": "31",
"p2 YY": "24",
"Initial": "JSS",
# "p2 I DO NOT Agree": "null",
"p2 Last Name": "Smith",
"p3 First Name": "John",
"p3 Middle Name": "R",
"p3 MM": "01",
"p3 DD": "25",
"p3 YY": "21"
},
}
}
}
pdfOut = "merged2.pdf"
merger = PdfWriter()
for file in myFiles:
print(file)
reader = PdfReader(myFiles[file]["path"])
reader.add_form_topname(file)
writer = PdfWriter(clone_from=reader)
# Update form fields for each page in the current PDF
for page in writer.pages:
print("page",page.page_number)
writer.update_page_form_field_values(
page,
myFiles[file]["usage"]["fields"],
auto_regenerate = False
)
merger.append(writer)
# Write the merged PDF to the output file
merger.write(pdfOut )
Thanks for working on this!
Just following up to see if I understand, does #2570 make it so that a field with the same name is filled in across all places, or does it only fill in the first value?
Thanks for working on this!
Just following up to see if I understand, does #2570 make it so that a field with the same name is filled in across all places, or does it only fill in the first value?
It should modify all "display"(annotations) that refers to the field that way. The only point is to be sure that the include all pages. For this purpose I recommend to wait for #2571 which will be easier (using page=None ) to update all pages