pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

ENH: Flatten PDF forms

Open OpenNingia opened this issue 9 years ago • 38 comments

pdftk provides the feature to embed the form fields' text in the pdf itself. This is very useful if you want to use an editable pdf as a template to be filled by code.

from the pdftk manual:

[ flatten ]
Use this option to merge an input PDF’s interactive form fields (and their data) with the PDF’s pages. Only one input PDF can be given. Sometimes used with the fill_form operation.

usage example:

    with open(source, 'rb') as source_fp:
        reader = PdfFileReader(source_fp)

        writer.appendPagesFromReader(
            reader, lambda x: writer.updatePageFormFieldValues(x, fields))

        with open(dest, 'wb') as output_fp:
            writer.write(output_fp, flatten_fields=True)

OpenNingia avatar Oct 14 '15 07:10 OpenNingia

+1 A way to flatten a form would be excellent. I would like to avoid having another dependency for my code, which uses PyPDF2. But shipping filled in forms around the interwebz creates problems with a variety of vendors and their [I assume not based on PyPDF2] software.

whitemice avatar Feb 02 '16 16:02 whitemice

It would be great if PyPDF2 had the ability to fill in forms and flatten them!

mertz3hack avatar Mar 16 '16 20:03 mertz3hack

I also would really appreciate this

oscardssmith avatar Jun 15 '16 16:06 oscardssmith

(In progress) We can accomplish this by setting Bit Position 1 of the field flags.

Ref: Table 8.70 of PDF 1,7 spec

mstamy2 avatar Aug 05 '16 02:08 mstamy2

Setting a field read-only might be a way, however pdftk works differently; afaik it replaces each /Field instance with a simple text object. :confused:

OpenNingia avatar Aug 05 '16 13:08 OpenNingia

You're right, that's the better option. Should be able to implement that soon

mstamy2 avatar Aug 05 '16 21:08 mstamy2

I agree. This would be totally awesome!

nberrios avatar Nov 04 '16 14:11 nberrios

Is there any update on this? I am looking to use an editable pdf as a template which will be filled by code.

jamoham avatar Nov 07 '16 13:11 jamoham

I'm with @jamoham on this... for the same exact use case.

kherrett avatar Dec 14 '16 16:12 kherrett

+1

zhiwehu avatar Apr 24 '17 04:04 zhiwehu

Any update on this?

Rob1080 avatar May 27 '17 17:05 Rob1080

Can you flatten a file with PyPDF2 yet? I've not found anything on this being implemented.

BeGrimm avatar Apr 17 '18 15:04 BeGrimm

I do see some code to _flatten in the PdfFileReader, but not in the writer. Will someone be taking a swing at this?

DrLou avatar Jan 18 '19 18:01 DrLou

I have exactly the same scenario as mentioned by @jamoham, @kherrett and @zhiwehu above. Has there been any progress on either being able to flatten a PDF, or set the fields as read-only?

Joshua-IRT avatar Aug 02 '19 02:08 Joshua-IRT

Rough bit of code if anyone needs to set fields to read-only prior to an update to the module (assumes you imported the whole module as PyPDF2). Works in a similar fashion to the existing updatePageFormFieldValues() method.

class PDFModifier(PyPDF2.PdfFileWriter):
    '''Extends the PyPDF2.PdfFileWriter class and adds functionality missing
    from the PyPDF2 module.'''

    def updatePageFormFieldFlags(self, page, fields, or_existing=True):
        '''
        Update the form field values for a given page from a fields dictionary.
        Copy field flag values from fields to page.

        :param page: Page reference from PDF writer where the annotations
            and field data will be updated.
        :param fields: a Python dictionary of field names (/T) and flag
            values (/Ff); the flag value should be an unsigned 32-bit integer
            (i.e. a number between 0 and 4294967295)
        :param or_existing: if there are existing flags, OR them with the
            new values (default True)
        '''

        # Iterate through pages and update field flag
        for j in range(0, len(page['/Annots'])):
            writer_annot = page['/Annots'][j].getObject()
            for field in fields:
                if writer_annot.get('/T') == field:
                    if or_existing:
                        current_flags = writer_annot.get('/Ff')
                        if current_flags is not None:
                            fields[field] = int(bin(current_flags | fields[field]),2)

                    writer_annot.update({
                        PyPDF2.generic.NameObject("/Ff"): PyPDF2.generic.NumberObject(fields[field])
                    })

Joshua-IRT avatar Aug 02 '19 04:08 Joshua-IRT

+1 for flattening, such as in pdftk!

chickendiver avatar Dec 02 '19 22:12 chickendiver

+1 for a method for flattening pdfs

techNoSavvy-debug avatar Feb 07 '20 18:02 techNoSavvy-debug

@mstamy2 , @OpenNingia

One thing I noticed with the approach of flattening/making forms read-only by setting the field flag bit to 1: when I try to merge resulting PDFs, only the values from the first document make it to the merged file. I don't think this is expected behavior.

  • pdftk does not seem to have this issue with its approach to flattening.
  • I believe it happens when merging filled-PDFs via PyPDF2 because the fields share the same field name. I'm not really sure on the best way around this beside vaguely trying to emulate pdftk's approach.

paulzuradzki avatar Mar 07 '22 18:03 paulzuradzki

Cross-posting this useful recipe by @Redjumpman: https://github.com/mstamy2/PyPDF2/issues/506

Remember to update the form field name if you want to merge multiple documents made from the same template form. Else, the merged PDF result will have identical pages due to each document sharing the same field names.

paulzuradzki avatar Mar 07 '22 20:03 paulzuradzki

PdfWriter.append() should provide you with capability to add pages with data fields.

Can you confirm that this issue can get closed?

pubpub-zz avatar Feb 26 '23 14:02 pubpub-zz

without feed back I close this issue as fixed. Feel free to provides updates if yuo wan to reopen it.

pubpub-zz avatar Mar 04 '23 14:03 pubpub-zz

I don't think the original issue is closed: how do you make fields non-editable easily? The use case being taking a PDF with editable forms, filling out the forms and outputing a PDF with non-editable fields.

rolisz avatar Mar 14 '23 20:03 rolisz

the read-only flag defined here in the Pdf 1.7 reference (page 676) image

therefore you have to set the flags. Below an example setting all the fields in readonly:

import pypdf
r = pypdf.PdfReader("input_form.pdf")
for f,v in r.get_fields().items():
  o=v.indirect_reference.get_object()   # this will provide access to the actual PDF dictionary 
  o[NameObject("/Ff")] = NumberObject( o.get("/Ff",0)|1)
w = pypdf.PdfWriter()
w.clone_document_from_reader(r)
w.write("output_form.pdf")

pubpub-zz avatar Mar 14 '23 21:03 pubpub-zz

What you are suggesting is not "flattening" thou. The output pdf will still present data fields (widgets) . Flattening as pdftk does is replacing the data field with text.

OpenNingia avatar Mar 14 '23 22:03 OpenNingia

@OpenNingia Can you provide a non-flat PDF file and its flattened version for review?

pubpub-zz avatar Mar 15 '23 18:03 pubpub-zz

Multiple pdf merged and flattened: Ichiro Yasuhigo.pdf

One of the editable source: sheet_all.pdf

OpenNingia avatar Mar 17 '23 19:03 OpenNingia

The flattening process is quite tough to compute (create XOBject with the good characteristics) modify the content to place them. I see personnally very limited advantage vs time to implement an for me the readonly alternative could be sufficient ; I will have no time to propose a PR. Any candidate ?

pubpub-zz avatar Mar 17 '23 20:03 pubpub-zz

since we have now #1864, flattening should be quite simple

pubpub-zz avatar Jun 25 '23 12:06 pubpub-zz

Can someone please provide a simple code snippet here for flattening a pdf?

rohit11544 avatar Dec 01 '23 06:12 rohit11544