PdfPig
PdfPig copied to clipboard
Enable loading a document from a PdfDocument into a PdfDocumentBuilder
In order for the library to be useful it must support editing as well as creating and reading. For this we need a way to read an existing document and convert it to a document builder.
This would make an awesome addition 💯 We're currently using PDFSharpCore but it doesn't have quite as nice of an interface to use and we see a lot of bugs from it.
Sort of related to this, plus possibly related to merging and/or perhaps #193:
I have a use case (currently done with PDF Clown) for updating the document info dictionary in an existing PDF, but not making any other changes (that is both modifying the 'known'/default properties and adding/removing/modifying custom ones).
It's been a while since I looked at the PDFs that PDFClown produces, but I think the simple cases are handled by appending a new info dictionary to the end of the file, then updating the file to reference that (such that the previous dictionary contents are left in the file as they were). (actually, I think i raised a bug some time back about PdfPig finding the wrong dictionary instance in some of those PDFs, which was fixed).
So, is there a simple way to do that with the current PdfPig? (i mention merging etc as I wonder if you could merge in the info dictionary from a new document? Not sure if that's a bit round about though?
Hi @Numpsy unfortunately I don't think there's any way to do this with the existing PdfPig code currently. I made a start on what I think needs to be done in https://github.com/UglyToad/PdfPig/tree/edit-existing-documents but I never seem to have time anymore for deep work on PdfPig :(
The aim (sorry for the terrible quality diagram) is to change PdfDocumentBuilder to be an API wrapper that directly modifies an in-memory tree of the underlying PDF objects:

The tree (on the right) is similar to what you'd see if you inspect a PDF document in the iTextRups desktop application, in the case of a new PDF document the tree is empty (except for Catalog/Trailer/InfoDictionary) and operations against the PdfDocumentBuilder add pages, fonts, etc to the tree and the user can also access the tree direcly. For an existing document the tree is fully populated and the PdfDocumentBuilder is populated with the corresponding pages and fonts entries and accessed in the same way as for new documents.
I don't actually think this should be a huge amount of work to implement (at least for an alpha version), the classes to load and parse the tree and write out arbitrary PDF objects to file all exist. They just need linking together. The problem is finding time to do it, as my activity log shows starting my current job has reduced my time to work on things to an hour or 2 at weekends.
For this, would it be better to have a graphics driver approach like PdfSharp where you create a graphics handler and pens or should that be internalized? Opinions?