publ-cg icon indicating copy to clipboard operation
publ-cg copied to clipboard

Do RS use epub:type structural vocabulary to know where the main content ends?

Open laudrain opened this issue 5 years ago • 9 comments

Reading Systems gather knowledge from the reader behavior. They know if the book has been read entirely when the very last xHTML document on the spine has been thoroughly paginated to the very last word. But more and more EPUB files have at their end marketing documents (author interview, teaser for next volume, ect).

So if the reader doesn't read these last marketing content documents, the book is never reported as finished by the Reading System. And the fact that the book that was not finished by the reader may well be a sign that the reader did not find it worth the read to the end...

Thank to epub:type, publishers use the structural semantic vocabulary to mark where the main content ends. When the reader comes at the end of the last content document with epub:type="bodymatter", then the book is finished !

Do Reading System use this markup ?

laudrain avatar Jul 12 '19 13:07 laudrain

Even if RSes use epub:type to assess which HTML file is the last one, this is usually applied at the beginning of a document. If a system is attempting to determine whether a user has finished a book (e.g. completed the last chapter but not read the index, which might have epub::type="backmatter"), will it be able to assess whether the reader has started the last chapter but not finished it?

Further there are almost no rules around the application of SSV, so I don't think that this is a reliable method.

TzviyaSiegman avatar Jul 12 '19 14:07 TzviyaSiegman

Structural Semantic Vocabulary is a rule in itself. And structure is well known from publishing processes. As "bodymatter" is a value that can be set on

tag, the closing tag shows exactly where the content ends.

laudrain avatar Jul 12 '19 14:07 laudrain

Couldn't you identify the beginning of the backmatter in the landmarks, same as where the bodymatter starts?

mattgarrish avatar Jul 12 '19 14:07 mattgarrish

They are ! These last content documents beyond the main content are identified with epub:type="backmatter" in the

tag. In Hachette Livre, all our EPUB3 files have a mandatory epub;type on each xHTML body tag. These epub:type values on must follow the logical structural order : cover, frontmatter, bodymatter, backmatter.

laudrain avatar Jul 12 '19 14:07 laudrain

How reading systems decide if a book has been "finished" is far, far outside the scope of the EPUB 3 specification. There is no interop problem. But this is an example where using the SSV can provide a very useful piece of information to a reading system, should it choose to use it. And best of all, this information exists in quite a few existing EPUBs.

I'm wary of suggesting a landmark, just because not much existing content has a useful landmark.

This is really up to reading systems. What if I open a book, follow a landmark to the first backmatter section, and then go backwards by one page? I've technically reached the end of the bodymatter without reading 1% of the book. Of course many books won't have even this information.

dauwhe avatar Jul 12 '19 15:07 dauwhe

I think we also need to think about whether backmatter is part of the content or not. What about an appendix? Many books have appendices that are really content. Should that be "bodymatter" or "backmatter"?

TzviyaSiegman avatar Jul 12 '19 15:07 TzviyaSiegman

What if I open a book, follow a landmark to the first backmatter section, and then go backwards by one page?

Couldn't you do the same thing by following the toc to the backmatter and going back one page, whatever method is used, though?

I'm just thinking it would be simpler to have one landmark that identifies the spine item where the backmatter begins and let the reading system worry about tracking the rest than expect authors to mark every content document with semantics.

Isn't the bodymatter landmark generally used for locating the first page to begin reading? I thought that was the one that actually got some uptake.

But I agree this doesn't seem like territory for the specification.

mattgarrish avatar Jul 12 '19 15:07 mattgarrish

This definitely falls into "what should RSs do with what we have". I think the spec covers this case well, "backmatter" is pretty unambiguous. I would say as a reading system if I know that there is a backmatter section, I definitely wouldn't want to mark the item as finished without user input. Maybe they do want to read the index (maybe it's a good index!). But currently we don't get this data consistently, if we addressed this as a matter for landmarks or just in the nav doc, I think we could poll RSs into what they would like to see and push it as a best practice.

wareid avatar Jul 12 '19 20:07 wareid

well, as somebody who has spent half a lifetime looking at user's reading behavior, I can say that how people read books and what constitutes "finished" is a great deal more complicated than "final chapter has been paginated" (see for example what happened to Amazon' KU when it took "last page synced as a signal...); having semantic landmarks for start and end of body matter (in a novel or a book with narrative structure) is indeed extremely useful in all sorts of scenarios other than "finished book" and how the reading system figures out if the reader actually finished that book uses a lot more inputs than "last chapter paginated" - for non-fiction books that don;t have a narrative structure the situation si alltogether different and the landmarks may make no sense...

arhomberg avatar Jul 14 '19 16:07 arhomberg