webpub-manifest icon indicating copy to clipboard operation
webpub-manifest copied to clipboard

Add a start landmark (to the PDF Profile)

Open mwbenowitz opened this issue 3 years ago • 8 comments
trafficstars

This adds a simple PDF profile to cover Manifests that are comprised only of PDF files (a requirement that all resources have a media type of application/pdf).

This accounts for a structure being used in some projects that represents collections of PDF files as a single resource that can be read by users. This enables such things as representing a resource as a set of PDF files with one per section/chapter, while still allowing for a unified reading experience.

This does not make any alterations to the Manifest, it simply requires that conforming manifests meet the standards laid out in it. The profile also specifies that start parameters may be specified in link href strings to allow manifests to specify start pages to enable a feature to skip white space at the start of files.

mwbenowitz avatar Jul 06 '22 22:07 mwbenowitz

We talked about this on today's call. Nothing set in stone yet, but what came up:

  • It's generally a good idea, we need such profile.
  • A start relation to indicates the start of the publication would be very useful.
  • We don't want to support fragments inside the reading order, so in your example with blank PDF pages, this would need to be addressed at the authoring stage.
  • Having multiple PDF resources in the reading order is a useful use case. It might be tricky to implement in navigators but there are workaround, such as displaying an affordance at the end of the current PDF resource.
  • A pageCount link property would be welcome if we have multiple PDF resources, to be able to compute a positions list without opening all the resources.

mickael-menu avatar Jul 13 '22 16:07 mickael-menu

Thank you for the feedback! I was unable to join the call, I actually don't think I have the meeting link, is there a way I could get that? As for the specifics here:

  • I think having start as a relation is a great idea, hadn't considered that but it would be useful for us as well.
  • What about supporting fragments in the Table of Contents? We don't author these PDFs and would like to support this functionality. I understand that the reading order has a specific meaning that might not be amenable to incorporating fragments. I'd like to find some way we could represent that as an attribute or property.
  • I think we've been able to handle the multiple PDF resource question on our end. If you'd be interested I can probably put you in touch with the devs who did that work
  • pageCount is something that I think we would like to see. This was intended to be a starting point, and features like these are definitely the direction I'd like to see this go.

mwbenowitz avatar Jul 14 '22 18:07 mwbenowitz

We paused the weekly calls for the summer but we'll be back end of August, I think. You can send a mail to [email protected] to request access to the Readium Slack workspace (mention this PR in the mail). The link and time is shared on the #general channel.

  • What about supporting fragments in the Table of Contents? We don't author these PDFs and would like to support this functionality.

Only the readingOrder and resources cannot have fragments, but you can have them in tableOfContents, links, etc. It would look like this:

{
  "href":"chapter1.pdf#page=32",
  "type": "application/pdf"
}

I think we've been able to handle the multiple PDF resource question on our end. If you'd be interested I can probably put you in touch with the devs who did that work

Sure, that would be very interesting, thanks. Which PDF engine(s) are you using?

mickael-menu avatar Jul 18 '22 08:07 mickael-menu

Hey y'all, just checking in here on the progress of this. I have two pieces of input:

  1. Having the start rel seems like a good idea and combined with a fragment in the links, should enable us to skip a first blank page, though it wouldn't allow us to do that for every resource in the reading order, only the first. As @mwbenowitz said, we don't author the PDFs and would like to find a way to describe a collection like this in the reading order. If you have any other suggestions there, we're all ears.
  2. I am curious how fragments are specified in the RWPM? Is there a standard set of fragments that are allowed? It seems they should be defined in a profile either for an individual media type or for a RWPM as a whole. For example the t=3.2 fragment makes sense for audiobooks while the page=4 fragment works for PDFs, but neither work for EPUB. What are your thoughts here?

kristojorg avatar Dec 30 '22 12:12 kristojorg

If you have any other suggestions there, we're all ears.

RWPM doesn't support fragments in readingOrder. This would make some stuff really complicated, for example when computing the position list of an EPUB, or when navigating backwards.

To get the best compatibility, and since you have the info, you could process the PDFs by removing the blank pages before packaging them.

I am curious how fragments are specified in the RWPM?

Fragment identifiers are not directly specified in the Link object, but we have a convention of using them in some cases as #anchor in href.

However they are mentioned in the specification of the Locator object. As they are specific to each media type, t=3.2 wouldn't be valid for a PDF:

They're by nature media-specific and should always be understood in the context of the resource that the locator points to (by looking at href and type).

It also identifies which fragment identifier specs are recognized:

Specification Scope Examples
HTML HTML or XHTML id
Media Fragment URI 1.0 Audio, Video and Images t=67, xywh=160,120,320,240
PDF PDF page=12, viewrect=50,50,640,480

In practice though, it really depends on what is implemented in each Navigator. For example viewrect is currently not supported in the official PDF navigators.

mickael-menu avatar Jan 02 '23 17:01 mickael-menu

I created a simpler PR to get something released soon, https://github.com/readium/webpub-manifest/pull/97. We may update this profile if we agree to add content from @mwbenowitz proposal later.

llemeurfr avatar Apr 22 '23 18:04 llemeurfr

Note that I would be in favour of a generic start relationship, not only used in the PDF profile.

llemeurfr avatar Apr 22 '23 18:04 llemeurfr

Note also that this "start" is in fact a landmark. There is a mechanism defined in Web Publications for landmarks, based on the EPUB solution. It is like a TOC, and therefore can handle fragments.

llemeurfr avatar May 01 '23 15:05 llemeurfr