tools icon indicating copy to clipboard operation
tools copied to clipboard

Extra white space after non-breaking hyphens on the Kobo build

Open Kos opened this issue 9 months ago • 6 comments

Symptom

When the kepub (se) build is viewed on a kobo device, there's extra white space included after a word containing non-breaking hyphens.

The same issue is not observed on the epub build or the kepub build created using kepubify.

Correct:

Image

Incorrect:

Image

Complete test case available on this branch: https://github.com/Kos/jules-verne_robur-the-conqueror/tree/whitespace-rendering-test

XHTML differences

Example line in source and epub:

<p>And the inexplicable <i>f‑r‑r‑r‑r</i> seemed to sweep along below it.</p>

The same line in kepub (se):

<p><span id="kobo.12.1" class="koboSpan">And the inexplicable </span><i><span id="kobo.12.2" class="koboSpan">f⁠-⁠r⁠-⁠r⁠-⁠r⁠-⁠r</span></i><span id="kobo.13.1" class="koboSpan"> seemed to sweep along below it.</span></p>

The same line in kepub (kepubify):

<p><span class="koboSpan" id="kobo.5.1">And the inexplicable </span><i><span class="koboSpan" id="kobo.5.2">f‑r‑r‑r‑r</span></i><span class="koboSpan" id="kobo.5.3"> seemed to sweep along below it.</span></p>

The changes seem structurally the same; only numerical IDs are different.

CSS differences

core.css has some ines removed in kepub (se):

--- "test case.epub.d/epub/css/core.css"        2025-03-08 16:00:02.000000000 +0100
+++ "test case.kepub.epub.d/epub/css/core.css"  2025-03-08 16:00:02.000000000 +0100
@@ -4,9 +4,6 @@
 body{
        font-variant-numeric: oldstyle-nums;
        hyphens: auto;
-       adobe-hyphenate: auto;
-       -webkit-hyphens: auto;
-       -moz-hyphens: auto;
        -epub-hyphens: auto;
        text-wrap: pretty;
 }
@@ -55,10 +52,6 @@
        page-break-inside: avoid;
        font-variant: small-caps;
        hyphens: none;
-       adobe-text-layout: optimizeSpeed; /* For Nook */
-       adobe-hyphenate: none;
-       -webkit-hyphens: none;
-       -moz-hyphens: none;
        -epub-hyphens: none;
        margin-top: 3em;
        margin-right: 0;
@@ -168,10 +161,6 @@
        break-inside: avoid;
        page-break-inside: avoid;
        hyphens: none;
-       adobe-text-layout: optimizeSpeed; /* For Nook */
-       adobe-hyphenate: none;
-       -webkit-hyphens: none;
-       -moz-hyphens: none;
        -epub-hyphens: none;
        text-align: center;
 }

core.css is unchanged in kepubify.

se.css has some adjustments in kepub (se):

--- "test case.epub.d/epub/css/se.css"  2025-03-08 16:00:02.000000000 +0100
+++ "test case.kepub.epub.d/epub/css/se.css"    2025-03-08 16:00:02.000000000 +0100
@@ -73,10 +73,6 @@
 section.epub-type-imprint a,
 section.epub-type-colophon a{
        hyphens: none;
-       adobe-text-layout: optimizeSpeed; /* For Nook */
-       adobe-hyphenate: none;
-       -webkit-hyphens: none;
-       -moz-hyphens: none;
        -epub-hyphens: none;
 }
 
@@ -94,7 +90,7 @@
        text-indent: 0;
 }
 
-section.epub-type-copyright-page blockquote p span{
+section.epub-type-copyright-page blockquote p span.se{
        display: block;
        padding-left: 1em;
        text-indent: -1em;
@@ -102,4 +98,13 @@
 
 section.epub-type-copyright-page blockquote br{
        display: none;
-}
\ No newline at end of file
+}
+
+/* Kobo compatibility CSS */
+
+section[epub|type~="titlepage"] h1,
+section[epub|type~="titlepage"] p,
+section[epub|type~="colophon"] h2,
+section[epub|type~="imprint"] h2{
+       font-size: 0; /* Required for Kobo not to add an extra page to the title */
+}

se.css is again unchanged in kepubify.

testcase.xhtml has some css added by kepubify, but it looks irrelevant:

--- "test case.epub.d/epub/text/testcase.xhtml" 2025-03-08 16:00:02.000000000 +0100
+++ "test case_kepubify.kepub.epub.d/epub/text/testcase.xhtml"  2025-03-08 17:00:02.000000000 +0100
@@ -1,30 +1,29 @@
-<?xml version="1.0" encoding="utf-8"?>
-<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en-US" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-US">
-       <head>
+<?xml version="1.0" encoding="utf-8"?><html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en-US" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-US"><head>
                <title>Chapter</title>
                <link href="../css/core.css" rel="stylesheet" type="text/css"/>
                <link href="../css/local.css" rel="stylesheet" type="text/css"/>
-       </head>
-       <body epub:type="bodymatter z3998:fiction">
+       <style type="text/css" class="kobostylehacks">div#book-inner { margin-top: 0; margin-bottom: 0;}</style></head>
+       <body epub:type="bodymatter z3998:fiction"><div id="book-columns"><div id="book-inner">
                <section id="chapter-6" role="doc-chapter" epub:type="chapter">

Kos avatar Mar 09 '25 08:03 Kos

Again, this is an issue with the renderer used by Kobo. When you open a regular epub, you trigger the ADE renderer which is very limited. When you open a kepub, you trigger their advanced renderer which is based on Webkit. The two renderers will render the same epub in very different ways and that's a Kobo problem.

Since you already have the test cases lined up, why don't you continue digging to see exactly which difference is causing these issues?

acabal avatar Mar 10 '25 18:03 acabal

Update: I did a git bisect over all the changes between...

  • (A).kepub.epub generated with SE tools
  • (B) .epub generated with SE converted into .kepub.epub using kepubify

and checked the immediate versions on my device. This gave me a nice comparison of all the changes introduced by SE-specific kepubify generation vs changes that another tool chooses to employ. Recall that there were artifacts that I observed in (A) but not in (B).

My initial guess about CSS was a miss: I tracked the problem down to a difference in generated XHTML.

SE generates:

<p>And the inexplicable <i>f‑r‑r‑r‑r</i> seemed to sweep along below it.</p>

where the punctuation is:

LATIN SMALL LETTER F
NON-BREAKING HYPHEN
LATIN SMALL LETTER R
NON-BREAKING HYPHEN
LATIN SMALL LETTER R
NON-BREAKING HYPHEN
LATIN SMALL LETTER R
NON-BREAKING HYPHEN
LATIN SMALL LETTER R

Kepubify leaves the non-breaking hyphens as is when converting to .kepub.epub:

<p><span class="koboSpan" id="kobo.5.1">And the inexplicable </span><i><span class="koboSpan" id="kobo.5.2">f‑r‑r‑r‑r</span></i><span class="koboSpan" id="kobo.5.3"> seemed to sweep along below it.</span></p>

however the SE kepub actually uses different unicode code points:

<p><span id="kobo.12.1" class="koboSpan">And the inexplicable </span><i><span id="kobo.12.2" class="koboSpan">f⁠-⁠r⁠-⁠r⁠-⁠r⁠-⁠r</span></i><span id="kobo.13.1" class="koboSpan"> seemed to sweep along below it.</span></p>
LATIN SMALL LETTER F
WORD JOINER
HYPHEN-MINUS
WORD JOINER
LATIN SMALL LETTER R
WORD JOINER
HYPHEN-MINUS
WORD JOINER
LATIN SMALL LETTER R
WORD JOINER
HYPHEN-MINUS
WORD JOINER
LATIN SMALL LETTER R
WORD JOINER
HYPHEN-MINUS
WORD JOINER
LATIN SMALL LETTER R

Code points in examples above are identified using the python snippet below, I can't guarantee that github formatting will keep them intact. You can replicate my results using the built files in the repo: https://github.com/Kos/jules-verne_robur-the-conqueror/tree/whitespace-rendering-test/dist

import unicodedata

def show(s):
    for c in s:
        print(unicodedata.name(c))

show('''...paste...''')

Do we know why SE tools replace NON-BREAKING HYPHEN with WORD JOINER + HYPHEN-MINUS + WORD JOINER? Is that a workaround for any other issue?

Kos avatar Apr 20 '25 18:04 Kos

It does that because Kobo doesn't render no-break hyphens. See https://github.com/standardebooks/tools/blob/9b647cf70fd7c7416c9cdbfd79775947779e9b78/se/se_epub_build.py#L774

Maybe Kobo has fixed that since we added that compatibility tweak?

acabal avatar Apr 22 '25 20:04 acabal

Possibly.. Do we have any record of the observation of Kobo not rendering no-break hyphens (e.g. device, version)? It would be valuable to check if the problem is gone on that device if it doesn't happen on my Clara.

I think one of my friends has a Kobo device, I'll ask to check.

Kos avatar Apr 23 '25 18:04 Kos

Here's from Kobo Libra h2o:

.kepub.epub (SE, zero width joiners + hyphen-minus)

Image

.kepubify (reference, non-break hyphens)

Image

Here's the summary of my observations:

Issue Kobo Clara Kobo Libra H2O
Zero-width joiners should render without extra spacing after the word
Non-breaking hyphens should display correctly
Prime and degree symbols should render without extra spacing after the symbol

Kos avatar Apr 25 '25 10:04 Kos

I just tested on my Kobo, and no-break hyphen still renders as some kind of too-high hyphen.

If you're using kepubify, you should confirm that it is retaining the no-break hyphen and not replacing it with some other character. You can use se extract-ebook to unzip the kepub file and inspect what the actual Unicode character is being used by kepubify.

My Kobo is the Aura One running the v4.38.23171 (10/29/24) software, which is the latest. I tested using the built-in "publisher default", "georgia", and "rakuten serif" fonts. All of them render U+2011 as a very high hyphen that looks clearly wrong.

acabal avatar May 01 '25 00:05 acabal