Extra white space after non-breaking hyphens on the Kobo build
Symptom
When the kepub (se) build is viewed on a kobo device, there's extra white space included after a word containing non-breaking hyphens.
The same issue is not observed on the epub build or the kepub build created using kepubify.
Correct:
Incorrect:
Complete test case available on this branch: https://github.com/Kos/jules-verne_robur-the-conqueror/tree/whitespace-rendering-test
XHTML differences
Example line in source and epub:
<p>And the inexplicable <i>f‑r‑r‑r‑r</i> seemed to sweep along below it.</p>
The same line in kepub (se):
<p><span id="kobo.12.1" class="koboSpan">And the inexplicable </span><i><span id="kobo.12.2" class="koboSpan">f-r-r-r-r</span></i><span id="kobo.13.1" class="koboSpan"> seemed to sweep along below it.</span></p>
The same line in kepub (kepubify):
<p><span class="koboSpan" id="kobo.5.1">And the inexplicable </span><i><span class="koboSpan" id="kobo.5.2">f‑r‑r‑r‑r</span></i><span class="koboSpan" id="kobo.5.3"> seemed to sweep along below it.</span></p>
The changes seem structurally the same; only numerical IDs are different.
CSS differences
core.css has some ines removed in kepub (se):
--- "test case.epub.d/epub/css/core.css" 2025-03-08 16:00:02.000000000 +0100
+++ "test case.kepub.epub.d/epub/css/core.css" 2025-03-08 16:00:02.000000000 +0100
@@ -4,9 +4,6 @@
body{
font-variant-numeric: oldstyle-nums;
hyphens: auto;
- adobe-hyphenate: auto;
- -webkit-hyphens: auto;
- -moz-hyphens: auto;
-epub-hyphens: auto;
text-wrap: pretty;
}
@@ -55,10 +52,6 @@
page-break-inside: avoid;
font-variant: small-caps;
hyphens: none;
- adobe-text-layout: optimizeSpeed; /* For Nook */
- adobe-hyphenate: none;
- -webkit-hyphens: none;
- -moz-hyphens: none;
-epub-hyphens: none;
margin-top: 3em;
margin-right: 0;
@@ -168,10 +161,6 @@
break-inside: avoid;
page-break-inside: avoid;
hyphens: none;
- adobe-text-layout: optimizeSpeed; /* For Nook */
- adobe-hyphenate: none;
- -webkit-hyphens: none;
- -moz-hyphens: none;
-epub-hyphens: none;
text-align: center;
}
core.css is unchanged in kepubify.
se.css has some adjustments in kepub (se):
--- "test case.epub.d/epub/css/se.css" 2025-03-08 16:00:02.000000000 +0100
+++ "test case.kepub.epub.d/epub/css/se.css" 2025-03-08 16:00:02.000000000 +0100
@@ -73,10 +73,6 @@
section.epub-type-imprint a,
section.epub-type-colophon a{
hyphens: none;
- adobe-text-layout: optimizeSpeed; /* For Nook */
- adobe-hyphenate: none;
- -webkit-hyphens: none;
- -moz-hyphens: none;
-epub-hyphens: none;
}
@@ -94,7 +90,7 @@
text-indent: 0;
}
-section.epub-type-copyright-page blockquote p span{
+section.epub-type-copyright-page blockquote p span.se{
display: block;
padding-left: 1em;
text-indent: -1em;
@@ -102,4 +98,13 @@
section.epub-type-copyright-page blockquote br{
display: none;
-}
\ No newline at end of file
+}
+
+/* Kobo compatibility CSS */
+
+section[epub|type~="titlepage"] h1,
+section[epub|type~="titlepage"] p,
+section[epub|type~="colophon"] h2,
+section[epub|type~="imprint"] h2{
+ font-size: 0; /* Required for Kobo not to add an extra page to the title */
+}
se.css is again unchanged in kepubify.
testcase.xhtml has some css added by kepubify, but it looks irrelevant:
--- "test case.epub.d/epub/text/testcase.xhtml" 2025-03-08 16:00:02.000000000 +0100
+++ "test case_kepubify.kepub.epub.d/epub/text/testcase.xhtml" 2025-03-08 17:00:02.000000000 +0100
@@ -1,30 +1,29 @@
-<?xml version="1.0" encoding="utf-8"?>
-<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en-US" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-US">
- <head>
+<?xml version="1.0" encoding="utf-8"?><html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en-US" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-US"><head>
<title>Chapter</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
- </head>
- <body epub:type="bodymatter z3998:fiction">
+ <style type="text/css" class="kobostylehacks">div#book-inner { margin-top: 0; margin-bottom: 0;}</style></head>
+ <body epub:type="bodymatter z3998:fiction"><div id="book-columns"><div id="book-inner">
<section id="chapter-6" role="doc-chapter" epub:type="chapter">
Again, this is an issue with the renderer used by Kobo. When you open a regular epub, you trigger the ADE renderer which is very limited. When you open a kepub, you trigger their advanced renderer which is based on Webkit. The two renderers will render the same epub in very different ways and that's a Kobo problem.
Since you already have the test cases lined up, why don't you continue digging to see exactly which difference is causing these issues?
Update: I did a git bisect over all the changes between...
- (A)
.kepub.epubgenerated with SE tools - (B)
.epubgenerated with SE converted into.kepub.epubusing kepubify
and checked the immediate versions on my device. This gave me a nice comparison of all the changes introduced by SE-specific kepubify generation vs changes that another tool chooses to employ. Recall that there were artifacts that I observed in (A) but not in (B).
My initial guess about CSS was a miss: I tracked the problem down to a difference in generated XHTML.
SE generates:
<p>And the inexplicable <i>f‑r‑r‑r‑r</i> seemed to sweep along below it.</p>
where the punctuation is:
LATIN SMALL LETTER F
NON-BREAKING HYPHEN
LATIN SMALL LETTER R
NON-BREAKING HYPHEN
LATIN SMALL LETTER R
NON-BREAKING HYPHEN
LATIN SMALL LETTER R
NON-BREAKING HYPHEN
LATIN SMALL LETTER R
Kepubify leaves the non-breaking hyphens as is when converting to .kepub.epub:
<p><span class="koboSpan" id="kobo.5.1">And the inexplicable </span><i><span class="koboSpan" id="kobo.5.2">f‑r‑r‑r‑r</span></i><span class="koboSpan" id="kobo.5.3"> seemed to sweep along below it.</span></p>
however the SE kepub actually uses different unicode code points:
<p><span id="kobo.12.1" class="koboSpan">And the inexplicable </span><i><span id="kobo.12.2" class="koboSpan">f-r-r-r-r</span></i><span id="kobo.13.1" class="koboSpan"> seemed to sweep along below it.</span></p>
LATIN SMALL LETTER F
WORD JOINER
HYPHEN-MINUS
WORD JOINER
LATIN SMALL LETTER R
WORD JOINER
HYPHEN-MINUS
WORD JOINER
LATIN SMALL LETTER R
WORD JOINER
HYPHEN-MINUS
WORD JOINER
LATIN SMALL LETTER R
WORD JOINER
HYPHEN-MINUS
WORD JOINER
LATIN SMALL LETTER R
Code points in examples above are identified using the python snippet below, I can't guarantee that github formatting will keep them intact. You can replicate my results using the built files in the repo: https://github.com/Kos/jules-verne_robur-the-conqueror/tree/whitespace-rendering-test/dist
import unicodedata
def show(s):
for c in s:
print(unicodedata.name(c))
show('''...paste...''')
Do we know why SE tools replace NON-BREAKING HYPHEN with WORD JOINER + HYPHEN-MINUS + WORD JOINER? Is that a workaround for any other issue?
It does that because Kobo doesn't render no-break hyphens. See https://github.com/standardebooks/tools/blob/9b647cf70fd7c7416c9cdbfd79775947779e9b78/se/se_epub_build.py#L774
Maybe Kobo has fixed that since we added that compatibility tweak?
Possibly.. Do we have any record of the observation of Kobo not rendering no-break hyphens (e.g. device, version)? It would be valuable to check if the problem is gone on that device if it doesn't happen on my Clara.
I think one of my friends has a Kobo device, I'll ask to check.
Here's from Kobo Libra h2o:
.kepub.epub (SE, zero width joiners + hyphen-minus)
.kepubify (reference, non-break hyphens)
Here's the summary of my observations:
| Issue | Kobo Clara | Kobo Libra H2O |
|---|---|---|
| Zero-width joiners should render without extra spacing after the word | ❌ | ❌ |
| Non-breaking hyphens should display correctly | ✅ | ✅ |
| Prime and degree symbols should render without extra spacing after the symbol | ❌ | ✅ |
I just tested on my Kobo, and no-break hyphen still renders as some kind of too-high hyphen.
If you're using kepubify, you should confirm that it is retaining the no-break hyphen and not replacing it with some other character. You can use se extract-ebook to unzip the kepub file and inspect what the actual Unicode character is being used by kepubify.
My Kobo is the Aura One running the v4.38.23171 (10/29/24) software, which is the latest. I tested using the built-in "publisher default", "georgia", and "rakuten serif" fonts. All of them render U+2011 as a very high hyphen that looks clearly wrong.