pdf-issues icon indicating copy to clipboard operation
pdf-issues copied to clipboard

Enhancement: PDF fragments can do things that PDF actions can't.

Open faceless2 opened this issue 10 months ago • 3 comments

It seems odd that we can use a PDF fragment identifier (normatively defined in 32K:2020 Annex O) to instruct a PDF viewer how to jump to a destination within a PDF, but we can't use those same fragment identifiers to jump within a PDF. Worse. they offer functionality beyond what we can do with existing PDF actions.

Specifically, the comment, search and highlight fragments offer functionality we can't reproduce:

  • file.pdf#comment=X instructs a PDF view to open file.pdf and jump to the comment with /NM(X)
  • file.pdf#search=X opens the file and requests the viewer searches for text "X".
  • file.pdf#highlight=x1,x2,y1,y2 opens the file and requests the viewer highlights an area of the PDF

Two solutions spring to mind - an enhancement could support either or both of these approaches:

  1. Add new "JumpToComment", "Search" and "Highlight" actions that reproduce this functionality - obvious, but causes problems with existing subsets like PDF/A. I don't think this approach is the best solution.
  2. In the same way that "SD" was added as a backwards-compatible addition for GoTo actions, add a Fragment to both regular and embedded GoTo actions to let viewers that support the fragement syntax use it, but gracefully degrade if not. For example:
  • <</S /GoTo /D [1 0 R null null 1]>> jumps to the page 1 0 R and set zoom=1
  • <</S /GoTo /D [1 0 R null null 1] /Fragment (page=1&zoom=1)>> is the same action - the fragment identifier specifies the same thing
  • <</S /GoTo /D [1 0 R null null 1] /Fragment (search=text)>> will search for the text "text" if the viewer supports the new Fragment syntax, and fall back to opening page 1 0 R if not

The advantage of this approach is that it's easier to use with embedded files - so long as GoToE actions accepted an optional Fragment, it would be possible to open an embedded file and search for text within it as an action (I'm thinking of EA-PDF archives here). Adding an additional key also means this enhancement could work with PDF/A, PDF/UA and friends.

EDIT: The PDF Association page https://pdfa.org/pdf-fragment-identifiers/ is relevant

faceless2 avatar Feb 27 '25 14:02 faceless2

A few comments/ideas:

  • such things wouldn't necessarily impact PDF/A since that does not preserve interactive behaviour like you describe - just static page appearance. So PDF/A files would still be valid even if linkages are not guaranteed but that is no different to today.
  • URI Actions don't appear to mention fragments either, so what would happen if the URI was just a fragment like #search=X (assuming Base is not set)
  • file specification dictionary FS entry is currently only defined to have URL as a value - so what if this was just a fragment (I believe this is permitted based on a quick skim of RFC 3986 so long as a scheme is also used)? Or we could define a new FS as Fragment...

petervwyatt avatar Feb 27 '25 21:02 petervwyatt

Thank you for indulging me Peter :smile:

The PDF/A issue I was worried about is that only some actions are allowed. Actually checking both A3 and A4, the phrasing is that certain types are not allowed. But I still think introducing new action types would be controversial.

You've lost me on the FS entry, but you're thinking of fragments for embedded files? I don't think this will work - there may be multiple links into that embedded file, each with a different fragment. The fragment is a property of the action, not the file.

Fragment-only URI actions are a very interesting idea! Here's a sample: fragment.pdf - Chrome actually works with the page and nameddest fragments only, and Acrobat, Preview, Chrome, Safari and Firefox don't work with any of them - they're either non-functional, display an error dialog or (in Acrobat) attempt to open the PDF in the browser. They're generally non-functional, so declaring that they have this purpose makes good sense.

However, there is no obvious way to have an action jump into an embedded file. Yes, the FS value is URL, so you could resolve relative URLs against the list of embedded file names. But it doesn't work for jumping from an embedded file to a parent, or a parent to an embedded file within an embedded file (all of which is possible with GoToE). A particularly neat solution would be defining a URL schema that allows such files to be referenced:<< /S /URI /URI (embedded:../path/to/file.pdf) >>

faceless2 avatar Feb 28 '25 09:02 faceless2

You've lost me on the FS entry...

Sorry about that. I began by thinking laterally, as we already have URI actions, but then more broadly about where else URI/URLs are in PDF and what fragment URLs might mean in each of those contexts. I also remembered that actions form a tree (see NOTE 1 below Table 196), so maybe there is something creative and backward compatible that could be done there.

petervwyatt avatar Feb 28 '25 12:02 petervwyatt