validation: incremental updates with catalog Version
When validating a PDF file that has been modified using incremental updates, pdfcpu appears to validate the entire document against the highest PDF version introduced by any of the incremental updates, rather than respecting the original PDF version of earlier sections of the document.
According to the PDF standard (e.g., ISO 32000-2:2020(E), Section 7.5.6):
The contents of a PDF file can be updated incrementally without rewriting the entire file. When updating a PDF file incrementally, changes shall be appended to the end of the file, leaving its original contents intact.
While the /Version entry in the document catalog dictionary (available from PDF 1.4+) can explicitly declare a higher minimum required PDF version for the document as a whole, the original content remains as is and remains structured and valid according to its original version. To my understanding, a compliant PDF reader should process the original content based on its version and then apply the subsequent updates, interpreting new features based on the version they were introduced in.
The current validation behavior in pdfcpu seems to apply the requirements of the latest PDF version (present in the incremental update) retroactively to elements in the original document sections, even if those elements were valid according to the original document's PDF version.
Example
Consider a PDF 1.2 document containing Type 1 fonts. In PDF 1.2, the FirstChar entry in a Type1Font dictionary was optional. In PDF 1.5, this entry became required. If this PDF 1.2 document is incrementally updated, and the update introduces PDF 1.5 features (or the /Version entry is updated to 1.5 or later), pdfcpu validation may incorrectly flag the original Type 1 font dictionaries as invalid because they are missing the FirstChar entry, resulting in an error like dict=type1FontDict required entry=FirstChar missing. This is incorrect because the original font dictionary was valid under the PDF 1.2 specification.
This can only be an issue if an increment overwrites the catalog version. Very exotic! - This must be intentional and for a reason and it will change the logical PDF version and therefore the behavior of the pdf processor.
What you are quoting does not apply for this situation (modifying the PDF version of the file)
An interesting case arises if the Version of the increment downgrades the PDF version.
It's not allowed to downgrade.
In this case the catalog version is indeed upgraded in an incremental update to enable features not supported in older PDF versions. However, the original file content cannot be modified, as doing so would invalidate existing digital signatures.
Technically an increment is supplying new objects or newer versions of existing objects.
If an old object is upgraded by a new one then PDF file content is altered potentially - there is no way around that.
There is no explicit guideline in the spec for this but this is my understanding:
If the Version is upgraded then a processors cannot ignore this and therefore validation and further processing is based on the upgraded Version.
This is independent of any signatures present, there is also increments without signatures.
@petervwyatt Please correct me if I am wrong which very well may also be the case.
Generally, that is correct, but there are some nuances, and it depends on what features you want to offer:
- If a dig-sig ever existed in the PDF, then the PDF revision(s) precisely at the time(s) of signing need to be very carefully checked (i.e., excluding any and all later incremental updates). Note that an incremental update could flag those dig-sig objects as "free," so there is a lot of detail here (such as processing all incremental updates and noticing which objects got deleted in which incremental updates, before deciding what to do - you cannot just trust the final PDF!).
- dig-sig has various permissions (MDP) which may need to be checked as to which kinds of objects have been added/deleted in incremental updates. You shouldn't just trust the final PDF!
- An incremental update does NOT have to modify any objects - it may simply update key/value pairs in the trailer (whether conventional or an XRefStm). Hence, in conventional cross-reference tables, you might see
xref 0 0, which is perfectly valid. - Malicious files can do other nasty things... e.g., shadow attacks
- There are utilities such as
pdfresurrectwhich try to "peel off" each incremental update to "resurrect" earlier PDF revisions. Mileage may vary with certain PDFs with such tools. - Hybrid reference PDFs are another issue altogether if they need to be validated, since there are 2 PoVs (an aware processor and an unaware (legacy) processor)...
And let's not even mention ByteRange!