XFA fields not updated when using update_page_form_field_values()
Environment
Python 3.10 pypdf 4.3.1+dev on sept,1st
Code + PDF
cf #2780 When modifying a form with XFA form, the fields in the XFA dataset are not modified
So for my use case i found a solution by "just" parsing the xfa:dataset xml and setting the values and saving the XML string back, the question is: is that a valid approach for every XFA form or not? If that approach is valid, I'll gladly write a PR that enhances the update_page_form_field_values method or implement an additional method to accomplish this. But I'm not quite sure if my approach is more than a shortcut.
Just working on the xfa will not allow standard tools to extract data from the fields information. My idea is just to extend the existing update_form_fields to also update xfa dataset if it exists
I identified something very interesting during the implementation of the proposed extension of update_form_fields.
The XFA "keys" of fields are different then the names used by pypdf in AcroForm. To verify i created this
pypdf_field_name_test.pdf . As you can clearly see in this screenshot the field is called F1.
If you check the key provided by pypdf you can see that it is 'F1[0]'. You can check with the code below.
from pypdf import PdfReader
reader = PdfReader("pypdf_field_name_test.pdf")
fields = reader.get_form_text_fields()
print(fields)
{'F1[0]': None}
If you look at the XFA template / dataset xml the field is name F1.
<template xmlns="http://www.xfa.org/schema/xfa-template/3.3/"><?formServer defaultPDFRenderFormat acrobat10.0dynamic?>
<subform name="form1" layout="tb" locale="de_DE" restoreState="auto">
<pageSet>
<pageArea name="Page1" id="Page1">
<contentArea x="0.25in" y="0.25in" w="197.3mm" h="284.3mm"/>
<medium stock="a4" short="210mm" long="297mm"/><?templateDesigner expand 1?>
</pageArea><?templateDesigner expand 1?>
</pageSet>
<subform w="197.3mm" h="284.3mm" name="topform">
<field name="F1" y="12.7mm" x="41.275mm" w="130.175mm" h="9mm">
<ui>
<textEdit>
<border>
<edge stroke="lowered"/>
</border>
<margin/>
</textEdit>
</ui>
<font typeface="Arial"/>
<para vAlign="middle"/>
<caption>
<para vAlign="middle"/>
<value>
<text>This is test of pypdf field names</text>
</value>
</caption>
</field><?templateDesigner expand 1?>
</subform>
<proto/>
<desc>
<text name="version">11.0.9.20240701.1.52.2</text>
</desc><?templateDesigner expand 1?><?renderCache.subset "Arial" 0 0 ISO-8859-1 4 72 18 0003002900370044004700480049004B004C004F005000510052005300560057005B005C FTadefhilmnopstxy?>
</subform><?templateDesigner DefaultPreviewDynamic 1?><?templateDesigner DefaultRunAt client?><?templateDesigner FormTargetVersion 33?><?templateDesigner DefaultCaptionFontSettings face:Arial;size:10;weight:normal;style:normal?><?templateDesigner DefaultValueFontSettings face:Arial;size:10;weight:normal;style:normal?><?templateDesigner DefaultLanguage JavaScript?><?acrobat JavaScript strictScoping?><?templateDesigner Rulers horizontal:1, vertical:1, guidelines:1, crosshairs:0?><?templateDesigner Zoom 190?><?templateDesigner WidowOrphanControl 0?><?templateDesigner SaveTaggedPDF 1?><?templateDesigner SavePDFWithEmbeddedFonts 1?><?templateDesigner Grid show:1, snap:1, units:0, color:ff8080, origin:(0,0), interval:(125000,125000), objsnap:0, guidesnap:0, pagecentersnap:0?>
</template>
I suspect that the naming of the fields with [0] was a deliberate choice made in the implementation.
The questions that arises now: shouldn't the names in the XFA and the AcroForm be identical and if not, would the removal of the [0] to update the XFA be an valid approach?
In my opinion the names of fields should be consistent and therefor the AcroForm names should not contain [0].
Best regards, Leon
some information are provided in https://pdfa.org/norm-refs/XFA-3_3.pdf
looking at "Field names" page 72++
Hi @pubpub-zz / @ljbergmann, have we come to a conclusion on how to update field values in a PDF that uses XFA fields?
In the end, I only care about a final PDF that can opened by end users with the fields populated.
if __name__ == "__main__":
reader = PdfReader(ONTARIO_TENANCY_AGREEMENT_PDF_TEMPLATE_PATH)
writer = PdfWriter()
writer.append(reader)
with open(OUTPUT_PATH, "wb") as output_stream:
writer.write(output_stream)
The code above just gives me a PDF with a single page that says:
If this message is not eventually replaced by the proper contents of the document, your PDF
viewer may not be able to display this type of document.
Seems like my XFA gets erased.
As this issue is still open and does not have a PR, there is no final solution/conclusion on this.
You are of course invited to investigate this issue yourself, document your findings and propose a PR afterwards.
@ericxiao251
XFAForms are part of the document. When you use .append() you can not copy it (this will erase the existing entry even if empty)
try some code like that:
if __name__ == "__main__":
reader = PdfReader(ONTARIO_TENANCY_AGREEMENT_PDF_TEMPLATE_PATH)
writer = PdfWriter(clone_from=reader)
with open(OUTPUT_PATH, "wb") as output_stream:
writer.write(output_stream)
or quicker:
if __name__ == "__main__":
writer = PdfWriter(clone_from=ONTARIO_TENANCY_AGREEMENT_PDF_TEMPLATE_PATH)
writer.write(OUTPUT_PATH)