pdfrw
pdfrw copied to clipboard
Pdf Form labels/values
I'm trying to read the labels and values from some pdf fillable form. I used:
x =pdfrw.PdfReader(path)
This gave me a dict with a ['/Root']['/AcroForm']['/Fields'] structure inside, but I can't find the form values I need.
[ pdfminer gives a similar structure, but has a resolver that takes care of getting the labels/values out of that dict, but I couldn't find any for pdfrw ]
Using PyPDF2 I could do:
x = PyPDF2.PdfFileReader(path)
d = x.getFields()
and I would get fields/values of the form.
Is this possible with pdfrw? I couldn't find anything in the examples so I'm asking it here. If it's possible it would be nice to have an example for this too. (please, thanks)
Try this:
import pdfrw
ANNOT_KEY = "/Annots"
ANNOT_FIELD_KEY = "/T"
ANNOT_VAL_KEY = "/V"
SUBTYPE_KEY = "/Subtype"
WIDGET_SUBTYPE_KEY = "/Widget"
PDF_NAME = "test.pdf"
template_pdf = pdfrw.PdfReader(PDF_NAME)
for page in range(0, len(template_pdf.pages)):
annotations = template_pdf.pages[page][ANNOT_KEY]
for annotation in annotations:
if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
if annotation[ANNOT_FIELD_KEY]:
name = annotation[ANNOT_FIELD_KEY]
print("{} ".format(name), end="")
if annotation[ANNOT_VAL_KEY]:
value = annotation[ANNOT_VAL_KEY]
print("= {}".format(value))
else:
print()