Is the StructTreeRoot an "element of the logical structure" and can have an associated file in PDF/A-3a?
The attached PDF 1.7 file claims to be an A-3a file, and contains four associated files. Two of them are referenced from a Formula StructElem
/Type /StructElem AF [11 0 R 20 0 R] /S /Formula ....
The other two are referenced from the StructTreeRoot:
/Type /StructTreeRoot /AF [33 0 R 35 0 R]
The AF are placed there, as deriving to html suggest this for code which should go into the head of the html:
The structure tree root element may have one or more associated files specified via an AF entry. These AF entries shall be processed to build the head element of the HTML output (see 4.6, "Associated file processing").
Validation with verapdf fails on the later ones with a reference to
Specification: ISO 19005-3:2012, Clause: 6.8, Test number: 4 The additional information provided for associated files as well as the usage requirements for associated files indicate the relationship between > the embedded file and the PDF document or the part of the PDF document with which it is associated
The relevant rule lists as one candidate for AF
Structure element dictionary, if the file is associated with any element in the document's logical structure
It is not clear if "any element" includes the StructTreeRoot or not.
We fail that document too. I think it's that recommendation in Deriving HTML to PDF that's at fault; 19005-3 Annex E is pretty clear it can only be Structure Elements, not the StructTreeRoot.
We fail that document too. I think it's that recommendation in Deriving HTML to PDF that's at fault; 19005-3 Annex E is pretty clear it can only be Structure Elements, not the StructTreeRoot.
the deriving to html document though is assuming PDF 2.0 / UA-2 not PDF 1.7 and PDF/A-3 so it's not at fault, it just means we have to generate pdf 2.0 doesn't it?
Well more exactly, the title page of Deriving to html says "A usage specification for tagged ISO 32000-2 files" although inside it does suggest files conforming to 32000-1 and UA-1 might also work
Fair point. So I suppose there are actually two issues here:
- Can you attach a file to the StructTreeRoot and have the file be valid PDF/A-3? No, ISO19005-3 says you can't.
- Is Deriving HTML to PDF wrong to recommend this approach? No, not if it's going to limit itself to ISO32000-2. But you could probably make a good argument for attaching them to the Catalog instead, as it's more typical.
@faceless2
Can you attach a file to the StructTreeRoot and have the file be valid PDF/A-3? No, ISO19005-3 says you can't.
Well the question is why. Annex E adds the option for AF basically everywhere, so why was StructTreeRoot excluded?
The note in Annex E 19005-3 says that it was necessary to copy the material [from ISO 32005-2] here to make this compatible with ISO 32005-1. So obviously the spec for associated files was backported from PDF 2.0.
Now, ISO 32005-2 doesn't speak of StructTreeRoot in 14.13 (the section about associated files) neither. It only speaks of a structure element dictionary (14.13.6, "Associated files linked to structure elements"). The only indication that one can put it also on StructTreeRoot is in Table 354.
So the question is if the change in table 354 wasn't backported by purpose (which one?), or if it was forgotten.
ISO 19005-3 (PDF/A-3) does NOT allow Associated Files anywhere - it only lists certain specific kinds of objects. And since it is formally based on ISO 32000-1, there are no further allowances. ISO 32000-2 (PDF 2.0) however does allow Associated Files (almost) anywhere. (The "almost" restriction is based on certain objects not allowing indirect references). This is aligned with full flexibility permitted by the core PDF spec, but being contained in the subset specs.
The informative(!) NOTE in 19005-3 (published back in 2012) noted that the new Associated File feature that was introduced in PDF/A-3 would also be adopted into the future ISO 32000-2, but that wasn't published until much later in 2017. Because all notes are informative, they provide no change to what is stated in the normative language of ISO 19005-3 or ISO 32000-1, and thus do NOT permit additional flexibility.
PS. Of course, any key can be put anywhere in any dictionary but such things are then strictly "private data" and have no formal definition from the perspective of subsets.
@petervwyatt
The informative(!) NOTE in 19005-3 (published back in 2012) noted that the new Associated File feature that was introduced in PDF/A-3 would also be adopted into the future ISO 32000-2, but that wasn't published until much later in 2017. Because all notes are informative, they provide no change to what is stated in the normative language of ISO 19005-3 or ISO 32000-1, and thus do NOT permit additional flexibility.
Yes I think you misunderstood the comment. It wasn't suggesting the note was normative or made AF allowed on struct tree root, but just that it stated an intention that the facility being added should be broadly in line with the feature being added in 32000-2..
That doesn't appear to be the case here unless there is something in the PDF 1.x description of StructTreeRoot that prevented it having an AF key? So the the question is not so much does that note allow it, but does it indicate that A-3 could or should have allowed it? Currently it looks like an oversight that could potentially be changed by errata. Not saying that there should be an errata, just trying to understand the specification and whether there is something different about StructTreeRoot in pdf 1.x that prevents it having AF, given that it has an AF key in 2.0.
Tagging @bdoubrov who chairs our PDF/A TWG to see what they recommend in terms of validity of PDF/A-3 with AFs on StructTreeRoot...
And also noting that the PDF TWG has an active work item to update "PDF 2.0 Application Note 002: Associated Files" which is severely out-of-date.
PDF/A TWG agrees that /AF entry can not be present in StructTreeRoot as per ISO 19005-3. A suggested resolution is to adjust the derivation algorithm to associate files directly with document Catalog.
A suggested resolution is to adjust the derivation algorithm to associate files directly with document Catalog.
...or with the Document (or possibly, DocumentFragment) SEs...
I wouldn't suggest special handling of the first Document element.
The Document and DocumentFragment allow associated files, and the processing is no different from any other structure element. They serve as a replacement or supplement for such element and their children.
The processing of associated files contained in the catalog allows insertion of data into the head of HTML. We have no representation of the head (we thought StructTreeRoot might be it). Going to provide alternative text in derivation