pdf-issues icon indicating copy to clipboard operation
pdf-issues copied to clipboard

[Question] Availability of complete RELAX-NG schemas to validate PDF XMP metadata

Open ceztko opened this issue 9 months ago • 5 comments

ISO 16684-2:2014 describes the use of RELAX-NG schemas to validate XMP metadata. The standard is not PDF focused, but provides a sample schema that validates basic XMP properties that are also used in PDF (e.g. dc:title, dc:creator, xmp:ModifyDate, etc.). Since its license is permissive I actually extracted it and made it available publicly. Extensions that support all the other PDF properties, including those described by PDF/A compliances, are not included in that schema (eg. pdf:Trapped, pdfaid:part, etc.). I noticed recently that extensions schema (in "PDF/A extension schema container schema" format) describing properties that are available in cross formats (eg. basic PDF, PDF/A, PDF/X, etc.) were published, but the main question is: are there RELAX-NG schemas that describes all the XMP properties mentioned in PDF, PDF/A, PDF/X, etc. standards? The same question was also submitted to Adobe XMP github project, to no avail.

ceztko avatar Mar 07 '25 14:03 ceztko

To be very clear: PDF/A-4 only recommends ("should") the use of ISO 16684-2 RELAX-NG schema (as an Associated File with AFRelationship == Schema) for custom XMP metadata. There is also an open question as to whether this would then force such PDF/A-4 files to always be conformance level PDF/A-4f due to the AF...

This is unlike all prior versions (PDF/A-1, PDF/A-2, and PDF/A-3) which all required ("shall") that all XMP custom extensions have their full schema declared in XMP Extension Schema syntax (as documented in the PDF/A-1, PDF/A-2, and PDF/A-3 specs) in an XMP metadata stream (so not as a standalone schema). Templates for this are available from: https://pdfa.org/free-xmp-extension-schema-templates/

petervwyatt avatar Mar 08 '25 04:03 petervwyatt

I should have explained more what I wanted to do, as templates in [1] are not enough for me: basically I wanted to validate the whole XMP packet using the method described in ISO 16684-2:2014, which suggests a procedure to normalize the XMP packet so there won't be ambiguities when validating it against a RELAX-NG grammar. I implemented the procedure myself a couple of years ago (I believe it's still the only public implementation around of that standard), even though I have still to check some edge cases. Now I would like to use that procedure and a standalone RELAX-NG grammar describing the XMP properties in use in PDF/A profiles to validate the XMP packet for these compliances. I understand that the XMP metadata can be extended through extensions, and such standalone grammars won't incluse those, but as a first step it would be enough to validate against a non extended grammar. It's my believing that such grammars still don't exist, at least not as open source/downloadable for free, but it would not be a big task to craft those, based on the basic free grammar I extracted from ISO 16684-2. In few weeks I could be actually working on the task, unless you point me to other public resources I was not aware of.

[1] https://pdfa.org/free-xmp-extension-schema-templates/

ceztko avatar Mar 08 '25 13:03 ceztko

Also: this is not really PDF/A-4 specific, as I want to validate PDF/A-1, PDF/A-2, PDF/A-3, so I believe you can remove that tag.

ceztko avatar Mar 08 '25 13:03 ceztko

OK - I misunderstood what you meant...

AFAIK (and as you state) there are no official or unofficial RELAX-NG schemas for any of the PDF subsets - but very happy if anyone in the community wishes to offer up theirs...

petervwyatt avatar Mar 09 '25 01:03 petervwyatt

As promised the WIP schema template (with links to current schema snapshots) can be found on this repository:

https://github.com/ceztko/XMP-RNG-Schema

ceztko avatar Jul 15 '25 08:07 ceztko