pypdf Support for Optional Content Groups

PyPDF2 does not currently have any support for Optional Content Groups (OCGs). When merging multiple documents into a single document the layers are effectively flattened and functionality is lost.

http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf 4.10.2 Making Graphical Content Optional

Apr 22 '14 12:04 snorfalorpagus

Thanks - We definitely need support for layered PDFs to display correctly (and possibly support for adding/removing layers?

Apr 22 '14 22:04 mstamy2

It looks like the OCG settings are stored in the 'OCProperties' dictionary in 'Root' - see dump below.

The problem here is that it uses IndirectObjects, which don't necessarily have the same ID in the input PDF to the output PDF when the page is appended. How do we get the ID of the corresponding new object in the output PDF?

{'/OCProperties': {'/D': {'/ListMode': '/VisiblePages',
                          '/Locked': [IndirectObject(8, 0),
                                      IndirectObject(9, 0)],
                          '/OFF': [IndirectObject(11, 0),
                                   IndirectObject(12, 0)],
                          '/Order': [IndirectObject(1, 0),
                                     [IndirectObject(2, 0),
                                      IndirectObject(3, 0),
                                      IndirectObject(4, 0)],
                                     [u'PDF Drawing Layer',
                                      IndirectObject(5, 0),
                                      IndirectObject(6, 0),
                                      IndirectObject(7, 0),
                                      IndirectObject(8, 0),
                                      IndirectObject(9, 0)],
                                     IndirectObject(10, 0),
                                     [IndirectObject(11, 0),
                                      IndirectObject(12, 0),
                                      IndirectObject(13, 0)]],
                          '/RBGroups': [[IndirectObject(11, 0),
                                         IndirectObject(12, 0),
                                         IndirectObject(13, 0)]]},
                   '/OCGs': [IndirectObject(7, 0),
                             IndirectObject(3, 0),
                             IndirectObject(9, 0),
                             IndirectObject(11, 0),
                             IndirectObject(1, 0),
                             IndirectObject(8, 0),
                             IndirectObject(6, 0),
                             IndirectObject(4, 0),
                             IndirectObject(12, 0),
                             IndirectObject(2, 0),
                             IndirectObject(10, 0),
                             IndirectObject(13, 0),
                             IndirectObject(5, 0)]},
 '/OpenAction': {'/D': [IndirectObject(25, 0), '/Fit'], '/S': '/GoTo'},
 '/PageLayout': '/SinglePage',
 '/PageMode': '/UseOC',
 '/Pages': IndirectObject(24, 0),
 '/Type': '/Catalog',
 '/ViewerPreferences': {'/NonFullScreenPageMode': '/UseNone'}}

Jun 10 '14 20:06 snorfalorpagus

Screenshot below of the PDF in Acrobat Reader (on Linux) that was used for the dump above.

layer_pdf

Jun 10 '14 20:06 snorfalorpagus

How did you get the dumped structure?

Jul 28 '15 05:07 emmama1234

@emmama1234 I've got the dumped structure like this:

from PyPDF2 import PdfFileReader
reader = PdfFileReader(file('test.pdf','rb'))
reader.trailer['/Root']['/OCProperties']

Jul 28 '15 12:07 snorfalorpagus

@snorfalorpagus did you find any way to add/remove OCG layers to multiple pages Pdf with PyPDF2?

Aug 20 '15 20:08 emmama1234

I didn't get any further than viewing the data as posted above. :(

Aug 21 '15 13:08 snorfalorpagus

@snorfalorpagus Thanks! i'll take a look at it to see if i can find something

Aug 21 '15 17:08 emmama1234

As this feature request didn't receive an update for a long time, I'm closing it.

I'm linking it in https://github.com/py-pdf/PyPDF2/discussions/1181 so that we don't forget about it. Please feel free to add more information (PDFs that use it; other projects that implement it; explanations how it would improve PyPDF2)

Jul 29 '22 18:07 MartinThoma

@snorfalorpagus @emmama1234 Hi, I know it's been a while but did either of you ever get further with this? I'm working on the same thing and used reader.trailer['/Root']['/OCProperties'] and reader._get_object to get all the direct object references. Then re-mapped them to the writer using writer._add_object and writer._root_object['/OCProperties']. The pdf still has trouble opening but I feel like I'm close. Do any of you have any suggestions? I can share the python code too if that helps.

@MartinThoma would love for this to be implemented in pypdf.

Oct 24 '24 16:10 mmalik1234

You are of course always invited to provide a corresponding PR to add such support.

Oct 24 '24 17:10 stefan6419846

@stefan6419846 Hi, I saw it was linked in https://github.com/py-pdf/pypdf/discussions/1181. My code does not work yet and still stuck.. The output pdf still does not have layers information and is blank.

Here's what I am doing so far:

import pypdf
from pypdf.generic import ArrayObject, DictionaryObject, NameObject

def get_ocgs_direct(reader):
    ocgs_props = DictionaryObject({})
    if "/OCProperties" in reader.trailer["/Root"]:
        ocgs_props = reader.root_object["/OCProperties"]
        if (len(ocgs_props) > 0):
            # get direct objects for ocgs
            for i, indirect in enumerate(ocgs_props["/OCGs"]):
                pdfobject = reader.get_object(indirect)
                ocgs_props[NameObject("/OCGs")][i] = pdfobject

            # repeat for order
            for i, indirect in enumerate(ocgs_props["/D"]["/Order"]):
                if isinstance(indirect, pypdf.generic._data_structures.ArrayObject):
                    # nested lists to resolve
                    arr = ArrayObject([reader.get_object(indirect[0])])
                    arr.append(ArrayObject([reader.get_object(indirect[1][0])]))
                    ocgs_props[NameObject("/D")][NameObject("/Order")][i] = ArrayObject(arr)
                else:
                    pdfobject = reader.get_object(indirect)
                    ocgs_props[NameObject("/D")][NameObject("/Order")][i] = pdfobject

    return ocgs_props

def set_ocgs_direct(writer, ocgs_direct):
    # re-reference ocjs to writer pdf using add_object
    for i, direct in enumerate(ocgs_direct["/OCGs"]):
        indirectobject = writer._add_object(DictionaryObject(direct))  # find out name object type
        ocgs_direct[NameObject("/OCGs")][i] = indirectobject

    # should update [/d][/order] already.
    for i, direct in enumerate(ocgs_direct["/D"]["/Order"]):
        if isinstance(direct, pypdf.generic._data_structures.ArrayObject):
            #nested lists to resolve
            direct[0] = writer._add_object(DictionaryObject(direct[0]))
            direct[1][0] = writer._add_object(DictionaryObject(direct[1][0]))
        else:
            indirect = writer._add_object(DictionaryObject(direct))
            ocgs_direct[NameObject("/D")][NameObject("/Order")][i] = indirect

    if "/OCProperties" in writer.root_object.keys():
        writer.root_object[NameObject("/OCProperties")].update(ocgs_direct)
    else:
        writer._root_object[NameObject("/OCProperties")] = DictionaryObject(ocgs_direct)

Oct 25 '24 19:10 mmalik1234

@mmalik1234 I am going to re-open this issue for now as there seems to be further interest, although this will probably only be supported if this is a contributed as a PR and is easy enough to maintain. I cannot help much with this, as I have no use case which would use OCGs.

Having a quick look at your code, you probably should not have to use the full class path for ArrayObject in the isinstance (although only related to style). Additionally, the last condition looks wrong with the backslash as key.

Oct 25 '24 19:10 stefan6419846

@stefan6419846 Thanks. This comment from 1181 shows my use case. https://github.com/py-pdf/pypdf/discussions/1181#discussioncomment-3408544

Oct 25 '24 20:10 mmalik1234

My usecase is, that I set pagenumbers or other overlay information on each page. I want to put such overlays on a named layer for easy identification and en-/disable viewing and printing hereof.

Specifically I want to be able to identify the layer to be able to remove the overlay again, e.g. inserting new pagenumbers while removing the old beforehand.

Thus, I need to be able to add the layer and ensure I can draw overlays on them.

Other way to identify and remove an overlay might also work - but layers / OCGs are the current idea.

Mar 19 '25 20:03 osos

I would like this to happen! I added an issue in pdfly before figuring out it needs support here first. https://github.com/py-pdf/pdfly/issues/190

My use case has to do with the PDF exports from AutoCAD which put each CAD layer (windows, doors, walls) on separate PDF layers / OCGs, and I would like the ability to work with them programmatically: toggle on/off, remove, add, copy to a new file, etc.

pymupdf has the ability to toggle visibility, but I'm not sure about the rest:

import pymupdfd
doc = pymupdf.open("file.pdf")
ogcs = doc.get_ocgs()
lyr_xref = ogcs[0]
doc.set_layer(-1, on=[lyr_xref], basestate="OFF")  # turn off all but the first layer

In pypdf I can retrieve these objects pretty easily, how can I actually do things with them?

import pypdf
doc = pypdf.PdfReader("file.pdf")
ocgs = doc.root_object.get_object()["/OCProperties"]["/OCGs"]
xrefs = [ocg.idnum for ocg in ocgs]
layers = [doc.get_object(xref)['/Name'] for xref in xrefs]
objs = [doc.get_object(xref) for xref in xrefs]

Oct 20 '25 05:10 p-vdp

In pypdf, you mostly need to know the official PDF specification and understand it to work with this, although it always is hard to help without an actual example file.

Section 8.11 of the PDF 2.0 specification defines the necessary aspects. The visibility policy for an OCG membership dictionary (OCMD) is defined by its visibility policy key /P in the easiest case, being set to either /AllOn, /AnyOn, /AnyOff or /AllOff. This can be overridden by the /VE key which defines a more complex visibility expression.

At the moment, every action should be possible in theory if you use the low-level interface where necessary. The main issue might be that we do not have a high-level interface at the moment.

Oct 20 '25 06:10 stefan6419846

@stefan6419846 Thanks! I'll look into the API and see what I can find.

Here's a sample file: building001-0_floor1.pdf

You can find other CAD examples here and export to PDF with TrueVew (gratis) to see the layers situation.

Oct 20 '25 09:10 p-vdp

Just getting my bearings in the codebase and the reference spec.... Could you perhaps point me to a basic example of what you mean by the "low-level interface?"

In constants.py I see the implementations of various spec tables, am I correct that Section 8.11 would need to be added here? For example I'm not seeing an implementation of Table 97 for the OGC visibility policy, just OC_PROPERTIES in CatalogDictionary.

Oct 21 '25 03:10 p-vdp

For example, given that you have a OCMD (obj['/Type'] == '/OCMD'), you could access the policy with obj['/P'] and update it with obj[NameObject('/P')] = NameObject('/AllOn').

am I correct that Section 8.11 would need to be added here?

Yes, constants/names would be added there, but it is not strictly necessary for the basic usage. If we add proper support for OCGs to pypdf, we would have to define a suitable interface first anyway.

You can find other CAD examples here and export to PDF with TrueVew (gratis) to see the layers situation.

I do not have a suitable Windows machine available, but as I am most likely not going to implement the new functionality myself anyway due to different reasons, it might help interested community members to look into it and evaluate implementations.

Oct 21 '25 14:10 stefan6419846

pypdf pypdf copied to clipboard

Support for Optional Content Groups

pypdf
pypdf copied to clipboard