PDFsharp icon indicating copy to clipboard operation
PDFsharp copied to clipboard

do not replace existing XMP metadata if any

Open julienrffr opened this issue 1 year ago • 9 comments

we should not simply replace and lose existing XMP metadata if any already set

NB: this feature is available in PDFsharp-extended nuget package

julienrffr avatar Oct 16 '23 14:10 julienrffr

I'm afraid it is not that simple.
We do not want to keep old PDFsharp XMP metadata either.

ThomasHoevel avatar Oct 16 '23 14:10 ThomasHoevel

I'm afraid it is not that simple. We do not want to keep old PDFsharp XMP metadata either.

XMP metadata can already be filled by the dev which is generating a new document from scratch. It can also come from an already generated document by a third party.

Either case it is bad to just replace as it's currently done.

Until a proper PdfMetadata implementation (which means access properties individually, not setting a plain text XMP XML), we should not simply replace.

Why don't you want to keep old metadata? The only thing that could make sense to change silently is the ModifyDate.

At least dev should have control of how metadata is handled, it should not be just lost.

julienrffr avatar Oct 16 '23 14:10 julienrffr

I'm afraid it is not that simple. We do not want to keep old PDFsharp XMP metadata either.

Another solution would be to add a document option in PdfDocumentOptions => XmpMetadata, which would have 2 values: auto/manual. If manual, XMP would not be overriden.

julienrffr avatar Oct 16 '23 14:10 julienrffr

I updated my pull request.

The behavior is now depending on PdfDocumentOptions.ManualXmpGeneration: true (manual): XMP metadata has to be built and attached manually via PdfMetadata class. false (auto): XMP metadata will be built and attached automatically.

The value is false by default, which means by default XMP metadata will be generated by PdfSharp and attached to the document silently.

This will allow devs who want to build their own XMP metadata to do so, without it being overriden by PdfSharp.

julienrffr avatar Oct 16 '23 15:10 julienrffr

As a developer, who reported this issue I would say thank you for fixing it. The flag will be enough for us to keeping PdfSharp. Maybe just rename it to "PreserveMetadata".

As an architect... well. It depends what do you want to achieve. If you just want to write properties like ProducerTool and ModDate, then overwriting the whole metadata is just not right. The metadata might contain information critical for the document. For example - it might contain some PDF/A conformance info. In my opinion you should first check if metadata exist. If exist, then it should be parsed and updated correctly. It can be only achieved when you have XMP parser. In our solution we use a combination of PdfSharp + XmpCore. Maybe XmpCore could be somehow incorporated into PdfSharp? In such case you won't need flags like "ManualXmpGeneration". Just parse existing metadata, update ProducerTool, serialize it back and write changes to pdf stream.

Also keep in mind that Metadata may appear in /Page objects. Some devices require it for batch printing. I'm modifying it by combination of PdfSharp and XmpCore. It works pretty well.

podprad avatar Oct 16 '23 17:10 podprad

If you just want to write properties like ProducerTool and ModDate, then overwriting the whole metadata is just not right. The metadata might contain information critical for the document. For example - it might contain some PDF/A conformance info. In my opinion you should first check if metadata exist. If exist, then it should be parsed and updated correctly. It can be only achieved when you have XMP parser. In our solution we use a combination of PdfSharp + XmpCore. Maybe XmpCore could be somehow incorporated into PdfSharp? In such case you won't need flags like "ManualXmpGeneration". Just parse existing metadata, update ProducerTool, serialize it back and write changes to pdf stream.

Totally agree, PdfSharp should not brutally overwrite existing metadata set by developer or existing on third party pdf. Indeed partial update of metadata via XmpCore could be interesting, BUT:

  • it is a strong dependency
  • arbitrary decide which metadata is overwritten? and on what schemas (dublin core, xmp 1.0, etc.)?
  • IMO developer should always have control on what is the output. Metadata should not be added at PreSave time without any chance to modify what has been silently changed.

To give an example, in my company, we even want to have control on the ModifyDate in Xmp's metadata. So partial update via XmpCore could be an option (implies some work and arbitrary decisions), but we should still have a flag that allows to completely have control over the Xmp metadata.

julienrffr avatar Oct 17 '23 08:10 julienrffr

XmpCore has BSD license, so maybe the source code can be copied to PdfSharp, without making strong DLL/nuget dependency: https://github.com/drewnoakes/xmp-core-dotnet

"arbitrary decide which metadata is overwritten? and on what schemas (dublin core, xmp 1.0, etc.)?" - derive it from original document, if possible. For new documents it should be allowed to specify the schema (via enum etc.).

"IMO developer should always have control on what is the output. Metadata should not be added at PreSave time without any chance to modify what has been silently changed." - Yes. In my opinion it also applies to ProducerTool. From my perspective I would like to avoid exposing information about our software externally. This is rather security consideration.

podprad avatar Oct 17 '23 08:10 podprad

Totally agree, overwriting existing metadata is not good. Instead of using xmpCore metadata can mayby be updated using XmlDocument.

J3ro3nC avatar Nov 02 '23 16:11 J3ro3nC

Partial update of XMP data is a bit of a work... And still developer should have full control if he wants to not override anything he already put.

NB: this feature is available in PDFsharp-extended nuget package

julienrffr avatar Nov 03 '23 16:11 julienrffr