pdfrw icon indicating copy to clipboard operation
pdfrw copied to clipboard

Pdf Form labels/values

Open typhoon71 opened this issue 5 years ago • 1 comments

I'm trying to read the labels and values from some pdf fillable form. I used:

x =pdfrw.PdfReader(path)

This gave me a dict with a ['/Root']['/AcroForm']['/Fields'] structure inside, but I can't find the form values I need.

[ pdfminer gives a similar structure, but has a resolver that takes care of getting the labels/values out of that dict, but I couldn't find any for pdfrw ]

Using PyPDF2 I could do:

x = PyPDF2.PdfFileReader(path)
d = x.getFields() 

and I would get fields/values of the form.

Is this possible with pdfrw? I couldn't find anything in the examples so I'm asking it here. If it's possible it would be nice to have an example for this too. (please, thanks)

typhoon71 avatar Apr 14 '19 13:04 typhoon71

Try this:

import pdfrw

ANNOT_KEY = "/Annots"
ANNOT_FIELD_KEY = "/T"
ANNOT_VAL_KEY = "/V"
SUBTYPE_KEY = "/Subtype"
WIDGET_SUBTYPE_KEY = "/Widget"

PDF_NAME = "test.pdf"

template_pdf = pdfrw.PdfReader(PDF_NAME)
for page in range(0, len(template_pdf.pages)):
    annotations = template_pdf.pages[page][ANNOT_KEY]
    for annotation in annotations:
        if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
            if annotation[ANNOT_FIELD_KEY]:
                name = annotation[ANNOT_FIELD_KEY]
                print("{} ".format(name), end="")
                if annotation[ANNOT_VAL_KEY]:
                    value = annotation[ANNOT_VAL_KEY]
                    print("= {}".format(value))
                else:
                    print()

gbroiles avatar Feb 22 '20 21:02 gbroiles