PdfPig icon indicating copy to clipboard operation
PdfPig copied to clipboard

Enable loading a document from a PdfDocument into a PdfDocumentBuilder

Open EliotJones opened this issue 6 years ago • 4 comments

In order for the library to be useful it must support editing as well as creating and reading. For this we need a way to read an existing document and convert it to a document builder.

EliotJones avatar Jan 01 '19 17:01 EliotJones

This would make an awesome addition 💯 We're currently using PDFSharpCore but it doesn't have quite as nice of an interface to use and we see a lot of bugs from it.

davidpene avatar May 21 '20 22:05 davidpene

Sort of related to this, plus possibly related to merging and/or perhaps #193:

I have a use case (currently done with PDF Clown) for updating the document info dictionary in an existing PDF, but not making any other changes (that is both modifying the 'known'/default properties and adding/removing/modifying custom ones).

It's been a while since I looked at the PDFs that PDFClown produces, but I think the simple cases are handled by appending a new info dictionary to the end of the file, then updating the file to reference that (such that the previous dictionary contents are left in the file as they were). (actually, I think i raised a bug some time back about PdfPig finding the wrong dictionary instance in some of those PDFs, which was fixed).

So, is there a simple way to do that with the current PdfPig? (i mention merging etc as I wonder if you could merge in the info dictionary from a new document? Not sure if that's a bit round about though?

Numpsy avatar Aug 03 '20 16:08 Numpsy

Hi @Numpsy unfortunately I don't think there's any way to do this with the existing PdfPig code currently. I made a start on what I think needs to be done in https://github.com/UglyToad/PdfPig/tree/edit-existing-documents but I never seem to have time anymore for deep work on PdfPig :(

The aim (sorry for the terrible quality diagram) is to change PdfDocumentBuilder to be an API wrapper that directly modifies an in-memory tree of the underlying PDF objects:

image

The tree (on the right) is similar to what you'd see if you inspect a PDF document in the iTextRups desktop application, in the case of a new PDF document the tree is empty (except for Catalog/Trailer/InfoDictionary) and operations against the PdfDocumentBuilder add pages, fonts, etc to the tree and the user can also access the tree direcly. For an existing document the tree is fully populated and the PdfDocumentBuilder is populated with the corresponding pages and fonts entries and accessed in the same way as for new documents.

I don't actually think this should be a huge amount of work to implement (at least for an alpha version), the classes to load and parse the tree and write out arbitrary PDF objects to file all exist. They just need linking together. The problem is finding time to do it, as my activity log shows starting my current job has reduced my time to work on things to an hour or 2 at weekends.

EliotJones avatar Aug 09 '20 11:08 EliotJones

For this, would it be better to have a graphics driver approach like PdfSharp where you create a graphics handler and pens or should that be internalized? Opinions?

SimantoR avatar Sep 13 '20 14:09 SimantoR