PyPDFForm icon indicating copy to clipboard operation
PyPDFForm copied to clipboard

Formwrapper and Pdfwrapper cannot fill image field

Open 73VW opened this issue 10 months ago • 10 comments

Hello there,

It seems that the package has difficulties identifying and filling image selection fields.

Here's an example of file with such field: https://www.liguepulmonaire.ch/sites/default/files/documents/verordnungsformular_4f.pdf

That's a prescription template.

It detects the field as Checkbox but the field doesn't have the /AP/D properties.

Here's the value of the widget:

{'/A': IndirectObject(665, 0, 140288380589648), '/DA': '/SyntaxLTStd-Roman 10 Tf 0 g', '/F': 4, '/FT': '/Btn', '/Ff': 65536, '/MK': {'/IF': {'/FB': True}, '/TP': 1}, '/P': IndirectObject(485, 0, 140288380589648), '/Rect': [382.767, 123.965, 541.868, 215.683], '/Subtype': '/Widget', '/T': 'ImageSign', '/Type': '/Annot'}

I will try to reverse engineer the content of the field just as I have done in #559 in order to try to fix this but I guess that won't be an easy trick.

73VW avatar Apr 08 '24 16:04 73VW

The wrong detection seems to come from these lines: https://github.com/chinapandaman/PyPDFForm/blob/master/PyPDFForm/patterns.py#L25-L26

Which catches all buttons

73VW avatar Apr 09 '24 08:04 73VW

So image field is not a widget type that the library supports. I don't think it's explicitly stated in the docs but I can add it later if it ever causes confusions.

The reasons is simple, I do not know what pattern precisely characterizes a widget as an image field. If this ever comes clear someday I can make PdfWrapper support it. This is not hard given it already supports signature field and draw_image. I don't think there will ever be a day that FormWrapper supports image field because I don't think I can feed an image file stream into a image widget even if I know it's an image field. This is the same reason why FormWrapper doesn't support signature field and I made that very clear in the docs.

chinapandaman avatar Apr 09 '24 22:04 chinapandaman

Hey @chinapandaman,

Indeed, I've searched on the internet and didn't find a standard regarding image fields. I totally understand that it will be hard to implement a standard out of it.

If you wish, I can try to support the signature field in FormWrapper first.

73VW avatar Apr 10 '24 05:04 73VW

You are more than welcomed to try getting the signature field to work for FormWrapper! If it helps at all this is by far the best resource I have been referring to. It's the ISO 32000 PDF standard document and chapter 12.7.4 (page 447) is where most information about form widgets is at.

chinapandaman avatar Apr 10 '24 21:04 chinapandaman

Hey @73VW, it's been a little over a week now. Do you mind if I close this issue? When you have something we could always open another one.

chinapandaman avatar Apr 19 '24 13:04 chinapandaman

Hey, so I spent today looking into this a bit more. When I compared your PDF with a PDF form with image field of mine, I noticed something that both PDF's image widgets share in common. Both image widgets have an /A property and it has three sub-properties:

  • /Type with a value /Action
  • /S with a value /JavaScript
  • /JS with a value event.target.buttonImportIcon();

I believe the /JS property's script is what get's executed when rendering the image field widget. So I googled event.target.buttonImportIcon(); and all the results I got indicate something related to PDF form's image field.

So I think this is a good enough evidence for identifying image field and I went ahead and wrote this PR. I added your PDF form as a test case and it seemed to work fine.

Anyway, the changes are released. Give v1.4.21 a try and tell me what you think. Docs are also updated.

chinapandaman avatar Apr 20 '24 22:04 chinapandaman

Hey buddy @chinapandaman,

Sorry for the delay I have been quite busy recently.

I will give a look at your PR, that's very cool!

Best.

73VW avatar May 27 '24 05:05 73VW

Hello everyone, I was trying to fill an image field on a PDF created by LibreOffice. Basically, I do not see the JavaScript code that is mentioned here (maybe is something custom added for conveniently set the image from a reader) so the image field is not recognized as so. In a reader or in a browser if I click on the image field nothing happens.

Trying things around I created in LibreOffice a PDF (exported from a docx) with both a "normal" button and an "image" button, and it seems to me that when exported they are basically the same? I edited the code here just to treat every button as an Image Field (({FT: Btn},),), and both of them are correctly filled with a picture as they were an Image Field. Here are both the DOCX and PDF files.

I think that the image field is just a normal button (with sometimes extra JS code or custom appearance to visually differentiate it) and maybe should be treated as such? So that we can fill every button with an image.

Searching around for other libs that interacts with PDFs I can also see that PDF-LIB doesn't have a specific class for the Image Field and that on the PDFButton class there is the setImage method.

cip91sk avatar Sep 25 '24 15:09 cip91sk

Hey @cip91sk , thanks for posting.

Unfortunately after some experimenting, I cannot make such changes you proposed, for two reasons:

  1. Currently the library tests filling image fields using two different PDFs. sample_template_with_image_field.pdf and 560.pdf which is the one @73VW brought up in this thread. If I open both of them in Adobe Acrobat and click on their image fields there will be a popup that lets me actually insert an image to that field (see screenshots). I also tried your PDF and like what you said when clicked nothing happens. To me this behavior is important to classify a widget as an image field and I do think the secret classifier is the JavaScript code described above.
  2. I cannot just make (({FT: Btn},),), the classifier for image field because it simply classifies too many other types of widgets. It is the base classifier for checkbox and radio buttons and it is also, despite not supported by the library, the classifier for other types of buttons such as clear form. Making it the classifier for image fields will just introduce a lot of problems to the library both now and in the future.

Let me know if you have more questions.

image image

chinapandaman avatar Sep 25 '24 22:09 chinapandaman

Yes I understand your concerns, and sorry about the classifier I have not made myself clear, it was just a quick and dirty way to test if simple buttons could have an image set to them, not a proposal for effectively making that change in code.

I think there should be a way to recognize all buttons without breaking checkboxes and radios, and as for the clear form button I think it could also have an image set to it. More specifically, I think that all "pressable" buttons (e.g. not checkboxes and radios) should have a way to have an image set.

EDIT: In the latest PDF specification (found from here) it seems that fields can be of four types (Table 220)):

  • Button
  • Text
  • Choice
  • Signature

Button Fields can only be of three types (12.7.4.2.1 Button Fields General):

  • Pushbutton
  • Checkbox
  • Radiobutton

For these fields, there are some Field Flags ( /Ff ) defined, but the most useful in this case I think its the Pushbutton Flag (bit 17) that "shall be set to one" for Pushbutton and "shall be clear" for both Checkbox and Radiobutton.

There are no mentions (well I have not found them, admittedly I haven't read everything) of image fields in the standard.

Given these informations I think that all the /Ft/Btn with the bit 17 set in /Ff can be considered as Pushbuttons and can have an image set, and as the bit 17 shall be clear for checkboxes and radiobuttons there shouldn't be overlapping in the detection

cip91sk avatar Sep 26 '24 07:09 cip91sk