pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

working with checkbox with /Kids or extrange /V

Open Luisonson opened this issue 2 years ago • 14 comments

I'm trying to automate filling this PDF: TEMPORAL COMPLETO12 de mayo_unlocked.pdf

I have no problem with the text, but with the checkboxes there is no way. Many /Btn have /Kids those /kids are other checkboxes that appear as "indirectObject". Also, normal checkboxes I can't select/modify in this pdf (examples bellow)

Code

This example was written for the pypdf2 1.26.0 version

from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
from collections import OrderedDict

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)
            })

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        # del writer._root_object["/AcroForm"]['NeedAppearances']
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

reader = PdfFileReader("TEMPORAL COMPLETO12 de mayo_unlocked.pdf")
writer = PdfFileWriter()

set_need_appearances_writer(writer)

page = reader.pages[0]

writer.addPage(page)

#Texto4 works, but not the checkboxes
writer.updatePageFormFieldValues(
    writer.getPage(0), {'BOTON_TIPOJORNADA': '/1',
                        'BOTON_JORN': '/S',
                        'Texto4': 'Texto4'
                        }
)
with open("filled-out.pdf", "wb") as output_stream:
    writer.write(output_stream)
reader.stream.close()

If I modified the pdf manually and read the fields...:

reader.getFields()

OUTPUT (one checkbox selected):
[...]

'BOTON_JORN': {'/FT': '/Btn',
  '/Kids': [IndirectObject(160, 0),
   IndirectObject(162, 0),
   IndirectObject(167, 0),
   IndirectObject(172, 0)],
  '/T': 'BOTON_JORN',
  '/Ff': 49152,
  '/V': '/S'},

OUTPUT (another checkbox selected):
[...]

'BOTON_JORN': {'/FT': '/Btn',
  '/Kids': [IndirectObject(160, 0),
   IndirectObject(162, 0),
   IndirectObject(167, 0),
   IndirectObject(172, 0)],
  '/T': 'BOTON_JORN',
  '/Ff': 49152,
  '/V': '/D'},

Another checkbox, with NO /kids but I can't select/modify is: 'TEXTOCasilla de verificación25' when selected has the value '/S#ED'

'TEXTOCasilla de verificación25': {'/FT': '/Btn',
  '/T': 'TEXTOCasilla de verificación25',
  '/V': '/S#ED'},

Thanks for your time.

PDF

TEMPORAL COMPLETO12 de mayo_unlocked.pdf

Luisonson avatar Jun 08 '22 11:06 Luisonson

Thank you for your bug report!

Would you mind sharing your PyPDF2 version + the environment you're using? (It's part of the bug ticket template)

I have no problem with the text, but with the checkboxes there is no way.

What does that mean? There is no way to do what?

MartinThoma avatar Jun 08 '22 12:06 MartinThoma

@Luisonson Have you seen https://pypdf2.readthedocs.io/en/latest/user/forms.html#filling-out-forms ? Does that help? If not, why?

MartinThoma avatar Jun 08 '22 12:06 MartinThoma

Thank you for your bug report!

Would you mind sharing your PyPDF2 version + the environment you're using? (It's part of the bug ticket template)

I have no problem with the text, but with the checkboxes there is no way.

What does that mean? There is no way to do what?

Hello,

Thanks for your answer. I'm using python 3.8.8 with pypdf2 2.1.0. My IDE is Spyder 5.1.5

I can't select/click the checkboxes or deselect. Also, some checkboxes appears just as /kids of another checkbox, so I can't interact with it as shown in the example with the checkbox BOTON_JORN that has 4 /kids... and those kids are another 4 checkboxes that the only thing I know about them is that are IndirectObject(X, 0).

@Luisonson Have you seen https://pypdf2.readthedocs.io/en/latest/user/forms.html#filling-out-forms ? Does that help? If not, why?

Yes, part of the code I have pasted is from there, but does not work in this PDF with the checkboxes.

Luisonson avatar Jun 08 '22 12:06 Luisonson

Is the problem that it's not shown? So maybe #227 / #355 ?

MartinThoma avatar Jun 08 '22 13:06 MartinThoma

Another hint: With this pdf (is just page 5 of the previous PDF): filled-out_5.pdf

If I try to update the text boxes, is ok, BUT, if i try to update the checkboxes (unsusesfully), then the text of the boxes is not shown unless I select the box: New code:

Updating two text boxes This examples were written for the pypdf2 2.1.0 version

from PyPDF2 import PdfFileReader, PdfFileWriter

reader = PdfReader("filled-out_5.pdf")
writer = PdfWriter()
page = reader.pages[0]
fields3 = reader.get_fields()

writer.add_page(page)

writer.update_page_form_field_values(
    writer.getPage(0), {"Texto41": "Test38",
                        "Texto56": "Test2"}
)
with open("filled-out_5_out.pdf", "wb") as output_stream:
    writer.write(output_stream)
reader.stream.close()

Updating two textboxes and trying to update one checkbox (the bug of the text not showing appears)

from PyPDF2 import PdfFileReader, PdfFileWriter

reader = PdfReader("filled-out_5.pdf")
writer = PdfWriter()
page = reader.pages[0]
fields3 = reader.get_fields()

writer.add_page(page)

writer.update_page_form_field_values(
    writer.getPage(0), {"Texto41": "Test38",
                        "Texto56": "Test2"}
)
writer.update_page_form_field_values(
    writer.getPage(0), {"BOTON_TPCON1": "/540"}
)

# write "output" to PyPDF2-output.pdf
with open("filled-out_5_out.pdf", "wb") as output_stream:
    writer.write(output_stream)
reader.stream.close()

Also, another error. After the new file is saved, If you try to obtain the fields of the new file with:

reader = PdfReader("filled-out_5_out.pdf")
reader.get_fields()

Does not show any field. I have to open the pdf with Adobe and save it with the adobe, then the code below works

Luisonson avatar Jun 08 '22 13:06 Luisonson

Is the problem that it's not shown? So maybe #227 / #355 ?

No, previusly I was using pypdf2 1.26 and i had the code to mitigate that issue (def set_need_appearances_writer(writer: PdfFileWriter)) on my first message. But with pypdf2 2.1.0 that function is not needed... until you try to modify a checkbox as I just told you in the previous message :(

Luisonson avatar Jun 08 '22 13:06 Luisonson

Oh, so it is a regression? It was working with 1.26 and now it is not working anymore with 2.1.0?

I'll have a closer look today evening after work :-)

MartinThoma avatar Jun 08 '22 13:06 MartinThoma

Oh, so it is a regression? It was working with 1.26 and now it is not working anymore with 2.1.0?

I'll have a closer look today evening after work :-)

I'm sorry, maybe I'm messing up things. There are several problems . In one hand I have problems with the checkboxes (that problem is with both versions). On the other hand is the problem with the text not showing unless I select the textbox, this second problem only appears in 2.1.0 if I try to change a checkbox, the code that solved that issue in 1.26 seems does not solved it in 2.1.0. Please, use the last code I have pasted and I think you will see it clearer than with my poor explanation.

Luisonson avatar Jun 08 '22 14:06 Luisonson

I'll post a series of comments here to keep track / let people know how I investigate the issue.

# Split, so that we only have one page to care about
$ qpdf --split-pages=1 TEMPORAL.COMPLETO12.de.mayo_unlocked.pdf out.pdf

# Uncompress so that I can view it in an editor
$ qpdf --stream-data=uncompress out-01.pdf uncompressed-1.pdf

That gives uncompressed-1.pdf

MartinThoma avatar Jun 08 '22 18:06 MartinThoma

Next I used PyPDF2 to find the form fields and their names. I looked for /Btn and found TEXTOCasilla de verificación25.

Before filling it:

<< /AP
<< /D
<< /Off 124 0 R /S#ed 125 0 R >> /N
<< /S#ed 126 0 R >> >>
/AS /Off
/DA (/ZaDb 0 Tf 0 0 1 rg) /F 4 /FT /Btn /MK
<< /CA (8) >> /P 3 0 R /Rect [ 51.3755 235.625 63.0763 248.636 ]
/Subtype /Widget /T (TEXTOCasilla de verificación25) /Type /Annot >>

After:

<< /AP
<< /D
<< /Off 171 0 R /S#ed 172 0 R >> /N
<< /S#ed 173 0 R >> >>
/AS /S#ed
/DA (/ZaDb 0 Tf 0 0 1 rg) /F 4 /FT /Btn /MK
<< /CA (8) >> /P 3 0 R /Rect [ 51.3755 235.625 63.0763 248.636 ]
/Subtype /Widget /T (TEXTOCasilla de verificación25) /Type /Annot
/V /S#ed >>

I notice two differences:

  1. /AS /Off changed to /AS /S#ed
  2. /V /S#ed was added.

MartinThoma avatar Jun 08 '22 18:06 MartinThoma

@Luisonson This ticks one checkbox:

from PyPDF2 import PdfReader, PdfWriter
from PyPDF2.generic import NameObject
from typing import Dict

def update_checkbox_values(page, fields: Dict[str, bool]): 
    for j in range(0, len(page['/Annots'])):
        writer_annot = page['/Annots'][j].getObject()
        field_name = writer_annot.get('/T')
        if field_name in fields:
            print(f"Found {field_name}")
            assert writer_annot.get('/FT') == '/Btn'
            print(writer_annot)
            if fields[field_name]:
                print("\tCheck it")
                writer_annot.update({
                    NameObject("/V"): NameObject("/S#ed"),
                    NameObject("/AS"): NameObject("/S#ed"),
                })
                for key in writer_annot:
                    print((key, writer_annot[key]))
            else:
                writer_annot.update({
                    NameObject("/V"): NameObject("/No"),
                    NameObject("/AS"): NameObject("/Off")
                })


reader = PdfReader("TEMPORAL.COMPLETO12.de.mayo_unlocked.pdf")

# See which fields exist
fields = reader.get_form_text_fields()
print(fields)

writer = PdfWriter()
writer.set_need_appearances_writer()
writer.add_page(reader.pages[0])
update_checkbox_values(writer.pages[0], {"TEXTOCasilla de verificación25": False})


with open("filled-out.pdf", "wb") as output_stream:
    writer.write(output_stream)

Does this help?

MartinThoma avatar Jun 08 '22 18:06 MartinThoma

Good Morning, Thanks for your time and efort. We are closer. With page5, for example: https://github.com/py-pdf/PyPDF2/files/8861867/filled-out_5.pdf

reader = PdfReader("filled-out_5.pdf")

# See which fields exist
fields = reader.getFields()
print(fields)

OUTPUT: {'TEXTOCasilla de verificación555': {'/FT': '/Btn', '/T': 'TEXTOCasilla de verificación555'}, 'BOTON_TPCON1': {'/FT': '/Btn', '/Kids': [IndirectObject(55, 0), IndirectObject(1586, 0)], '/T': 'BOTON_TPCON1', '/Ff': 49152, '/V': '/450401'}, 'Texto56': {'/FT': '/Tx', '/T': 'Texto56'}, 'Texto41': {'/FT': '/Tx', '/T': 'Texto41'}, 'BOTON_INT1': {'/FT': '/Btn', '/Kids': [IndirectObject(1597, 0), IndirectObject(1599, 0), IndirectObject(1604, 0), IndirectObject(1609, 0), IndirectObject(1614, 0), IndirectObject(1619, 0), IndirectObject(1624, 0), IndirectObject(1629, 0), IndirectObject(1634, 0), IndirectObject(1639, 0), IndirectObject(1644, 0), IndirectObject(1649, 0), IndirectObject(1654, 0)], '/T': 'BOTON_INT1', '/Ff': 49152}, 'BOTON_INT1357': {'/FT': '/Btn', '/T': 'BOTON_INT1357', '/Ff': 49152}, 'BOTON_INT166': {'/FT': '/Btn', '/T': 'BOTON_INT166', '/Ff': 49152}}

I need to modify BOTON_TPCON1, from /450401 to /540. But, with your example: writer.pages[0]['/Annots'][X].getObject().get('/T') only detects: Texto56 Texto41 BOTON_INT1357 BOTON_INT166

so....

On the other hand, yesterday someone told me about the fdf file, whitch is an ascii template (easy to modify), whitch you open and merge with the pdf and the pdf will pick up the values of the fdf file. Is pyPDF2 capable of handling fdf files? If not, would be a nice feature to add.

Luisonson avatar Jun 09 '22 07:06 Luisonson

I've seen fdf being mentioned somewhere, but I have no experience with it.

I'm open to PRs, but I also need to check if adding fdf support is in scope for PyPDF2.

MartinThoma avatar Jun 09 '22 11:06 MartinThoma

For example, in my case, for change some values of the first page is:

%FDF-1.2
%âãÏÓ
1 0 obj
<</FDF<</F(TEMPORAL COMPLETO12 de mayo_unlocked_borrar1.pdf)/Fields[
<</T(BOTON_BON1)/V/Off>>
<</T(BOTON_CLA1)/V/Off>>
<</T(BOTON_CLA13)/V/Off>>
<</T(BOTON_CLA166)/V/Off>>
<</T(BOTON_DISBON)/V/Off>>
<</T(BOTON_DISC1)/V/Off>>
<</T(BOTON_EX44)/V/Off>>
<</T(BOTON_EXCL)/V/Off>>
<</T(BOTON_INS)/V/Off>>
<</T(BOTON_INT1)/V/Off>>
<</T(BOTON_INT1357)/V/Off>>
<</T(BOTON_INT166)/V/Off>>
<</T(BOTON_INVEMP)/V/Off>>
<</T(BOTON_INVEMP2)/V/Off>>
<</T(BOTON_INVEMP266)/V/Off>>
<</T(BOTON_INVEMP266332)/V/Off>>
<</T(BOTON_INVEMP999)/V/Off>>
<</T(BOTON_INVEMP999635)/V/Off>>
<</T(BOTON_INVEMP9997895)/V/Off>>
<</T(BOTON_INVEMP99988)/V/Off>>
<</T(BOTON_INVTIPO)/V/Off>>
<</T(BOTON_INVTIPO11)/V/Off>>
<</T(BOTON_INVTIPO117)/V/Off>>
<</T(BOTON_ISOC8962)/V/Off>>
<</T(BOTON_JORN)/V/S>>
<</T(BOTON_JORNasdf)/V/Off>>
<</T(BOTON_JORNcvbm)/V/D>>
<</T(BOTON_MAY)/V/Off>>
<</T(BOTON_MODAL1)/V/Off>>
<</T(BOTON_OTR)/V/Off>>
<</T(BOTON_REL2)/V/Off>>
<</T(BOTON_TIPOJORNADA)/V/1>>
<</T(BOTON_TPCON1)/V/Off>>
<</T(BOTON_TPCON100)/V/Off>>
<</T(BOTON_TPCON1006)/V/Off>>
<</T(BOTON_TPCON12)/V/Off>>
<</T(BOTON_TPCON196)/V/Off>>
<</T(BOTON_TPCON1969)/V/Off>>
<</T(BOTON_TPCON1985)/V/Off>>
<</T(BOTON_TPCON198745)/V/Off>>
<</T(BOTON_TPCON199)/V/Off>>
<</T(BOTON_VICT)/V/Off>>
<</T(ID_EMPR)/V(16083466A)>>
<</T(TEXTO Casilla de verificación 480)/V/Off>>
<</T(TEXTO Casilla de verificación 481)/V/Off>>
<</T(TEXTO20369)/V/Off>>
<</T(TEXTOCasilla de verificación106666)/V/Off>>
<</T(TEXTOCasilla de verificación12)/V/Off>>
<</T(TEXTOCasilla de verificación13)/V/Off>>
<</T(TEXTOCasilla de verificación25)/V/S#ED>>
<</T(TEXTOCasilla de verificación285)/V/Off>>
<</T(TEXTOCasilla de verificación2853)/V/Off>>
<</T(TEXTOCasilla de verificación28999)/V/Off>>
<</T(TEXTOCasilla de verificación289996)/V/Off>>
<</T(TEXTOCasilla de verificación3221)/V/Off>>
<</T(TEXTOCasilla de verificación32369)/V/Off>>
<</T(TEXTOCasilla de verificación327)/V/Off>>
<</T(TEXTOCasilla de verificación32987)/V/Off>>
<</T(TEXTOCasilla de verificación3299)/V/Off>>
<</T(TEXTOCasilla de verificación369877)/V/Off>>
<</T(TEXTOCasilla de verificación4)/V/Off>>
<</T(TEXTOCasilla de verificación43)/V/Off>>
<</T(TEXTOCasilla de verificación43968)/V/Off>>
<</T(TEXTOCasilla de verificación5)/V/Off>>
<</T(TEXTOCasilla de verificación51)/V/Off>>
<</T(TEXTOCasilla de verificación5189)/V/Off>>
<</T(TEXTOCasilla de verificación518977)/V/Off>>
<</T(TEXTOCasilla de verificación555)/V/Off>>
<</T(TEXTOCasilla de verificación6)/V/Off>>
<</T(TEXTOCasilla de verificación62)/V/Off>>
<</T(TEXTOCasilla de verificación622222)/V/Off>>
<</T(TEXTOCasilla de verificación626)/V/Off>>
<</T(TEXTOCasilla de verificación64)/V/Off>>
<</T(TEXTOCasilla de verificación65)/V/Off>>
<</T(TEXTOCasilla de verificación66)/V/Off>>
<</T(TEXTOCasilla de verificación661)/V/Off>>
<</T(TEXTOCasilla de verificación69)/V/Off>>
<</T(TEXTOCasilla de verificación6911)/V/Off>>
<</T(TEXTOCasilla de verificación7)/V/Off>>
<</T(TEXTOCasilla de verificación72)/V/Off>>
<</T(TEXTOCasilla de verificación7222)/V/Off>>
<</T(TEXTOCasilla de verificación723)/V/Off>>
<</T(TEXTOCasilla de verificación726)/V/Off>>
<</T(TEXTOCasilla de verificación8)/V/Off>>
<</T(TEXTOCasilla de verificación91)/V/Off>>
<</T(TEXTOCasilla de verificación911)/V/Off>>
<</T(TEXTOCasilla de verificación95555)/V/Off>>
<</T(Textocasilla de verificación3)/V/Off>>
<</T(Textocasilla de verificación30)/V/Off>>]
/ID[<25F5DFD17199935FF41213A08FEAFF84><9F88950AEDB5B44BBCEF4494778262B8>]
/UF(TEMPORAL COMPLETO12 de mayo_unlocked_borrar1.pdf)>>/Type/Catalog>>
endobj
trailer
<</Root 1 0 R>>
%%EOF

As you can see, it is quite simple and self-explicatory. BUT, pyPDF2 has to be capable of update any value. To open the fdf file and merge with the pdf I'm using pdftk, that is an old (9 years) exe... but does the job.

As an another example: For the file filled-out_5.pdf that I told you I'm not able to change the checkbox BOTON_TPCON1, The fdf file is (change .txt to .fdf): filled-out_5_datos.txt Quite simple and seems only altered the /V value.

To generate an fdf file, open the pdf file with acrobat -> file -> create -> create form

Luisonson avatar Jun 09 '22 11:06 Luisonson

@MartinThoma I Propose to close this issue, unless you plan some work on FDF file but this is too far away from pdf for me

pubpub-zz avatar Jun 25 '23 12:06 pubpub-zz

I (sadly) have to agree: I don't see FDF support happening soon and I don't see us getting process here.

I have added a link to https://github.com/py-pdf/pypdf/discussions/1181 . Feel free to add here or there more details on FDF (PRs introducing support would also be very welcome!).

The fact that I'm closing this is a reflection on the fact that no core contributor will pick this up in the next half year. We want this support in pypdf, but we don't have the resources to make it happen any time soon.

MartinThoma avatar Jun 30 '23 17:06 MartinThoma

OK, no problem. Thanks for your time.

Luisonson avatar Jul 04 '23 14:07 Luisonson