koreader
koreader copied to clipboard
Dictionary definition with mixed LTR and RTL text does not render correctly
- KOReader version: Latest
- Device: Android and Kobo
Dictionary definition with mixed LTR and RTL text does not render correctly
- Install Jastrow Dictionary
- Look up definition of כותבת
Incorrect rendering:
Correct rendering in the Alpus app for Android:
What is happening is that the definition starts with a RTL word, and there's nothing anywhere saying the full content is mainly LTR and that the paragraph should be laid out LTR.
What got me a bit worried on the first screenshot is that, even though it all looks like it is rendered as a RTL paragraph, the last line is left aligned, while it should be right aligned.
But it's rendered by MuPDF as it is a HTML dictionary.
If I change in the .ifo: sametypesequence=h
(html) to sametypesequence=t
(not html), it gets rendered by our own TextBoxWidget and the RTL paragraph is fully correct:
So, in the text rendering, indeed, the first word being RTL, and nothing anywhere saying it is a LTR definition, it makes the whole paragraph assumed to be RTL. We don't have yet an option to tag dictionaries as having their content mainly LTR or RTL, and there is nothing in the startdict .ifo stuff for that I think.
That's the HTML output we get from the dict:
<strong dir=\"rtl\">¿¿¿¿</strong> f. (<a dir=\"rtl\" class=\"refLink\" href=\"/Jastrow,_¿¿¿ I.1\" data-ref=\"Jastrow, ¿¿¿ I 1\">¿¿¿</a>) <i>morning star</i> (in b. h. <i>a jewel</i>, v. <a class=\"refLink\" href=\"/Jastrow,_¿¿¿¿¿.1\" data-ref=\"Jastrow, ¿¿¿¿¿ 1\">next w.</a>). Y. Yoma III, beg. 40¿; Y. R. Hash. II, beg. 57¿, expl. <a dir=\"rtl\" class=\"refLink\" href=\"/Jastrow,_¿¿¿¿¿ I.1\" data-ref=\"Jastrow, ¿¿¿¿¿ I 1\">¿¿¿¿¿</a>; v. <a class=\"refLink\" href=\"/Jastrow,_¿¿¿ I.1\" data-ref=\"Jastrow, ¿¿¿ I 1\"><span dir=\"rtl\">¿¿¿</span> I</a>.
So, with this, MuPDF (like our text rendering) decides the whole paragrpah is to be RTL (even if it does bad on the last line).
You can help it consider the content is LTR by creating a microscript that will adjust the html output, by creating alongside the Jastrow.ifo
a Jastrow.lua
containing:
return function(html)
-- return "<div dir='ltr'>" .. html .. "</div>" -- somehow, this does not work
return "<span dir='ltr'>" .. html .. "</span>" -- but this does
end
and you will get:
(If you may get multiple paragraphs from the dict, you may need to tweak this microscript to split by line/para, and wrap each of them in a <span dir='ltr'>
.)
I'd say it's not a KOReader bug, but the HTML dict could have help itself by wrapping each result in such a <span dir='ltr'>
.
Thanks! But it still leaves me wondering, how does the Alpus app get it right? It's pointing to the same dictionary...
Where are these microscripts documented? Can one be used to change the font of the dictionary popup?
@jayjf
You can create a .css file with the same name next to the .ifo file in the dictionary folder. Then, I am not sure, maybe apply font via css to body
?
@poire-z
I'd say it's not a KOReader bug, but the HTML dict could have help itself by wrapping each result in such a
<span dir='ltr'>
.
It is (well, a MuPDF bug). The default document direction is LTR so the wrapping you refer to is there by default for the entire document unless otherwise specified with dir=auto or dir=rtl. Even with dir=auto I don't think this would be the correct behavior due to the <strong>
wrapping, but that's more subtle.
@jayjf
Where are these microscripts documented?
https://github.com/koreader/koreader/wiki/Dictionary-support#html-encoding-within-stardict-dictionaries-supported
But it still leaves me wondering, how does the Alpus app get it right? It's pointing to the same dictionary...
Then it uses something other than MuPDF. ;-) (The most obvious guess would be Android WebView.)
The default document direction is LTR
Oh, right.
Well, after a quick look at (our version of) MuPDF, I couldn't really figure out how its bidi handling works...
It doesn't seem to parse body {direction: ltr}
, it doesn't work as expected if in htmlboxwidget.lua we wrap with <body dir='ltr'>
instead of just <body>
... Neither wrapping the inner content with <div dir='ltr'>
. Only thing working is <span dir='ltr'>
, but we can't make the whole content inline.
But it still leaves me wondering, how does the Alpus app get it right? It's pointing to the same dictionary...
May be because it handles the default direction correctly. Or it doesn't handle RTL at all. I think even with main Hebrew>He or Arabic>Ar dictionaries, that may not wrap their output in a <div dir='rtl'>
, it would have to guess/force some behaviour, that may not always be the expected one.
Can one be used to change the font of the dictionary popup?
Only per dictionary, and for HTML dictionaries where we can indeed fix the html with a .lua script, or tweak the styles with a .css file.
You can create a .css file
See here how you could force your preferred font with the CSS: https://github.com/koreader/koreader/issues/9229#issuecomment-1160476900
I guess you could then use:
* { font-family: myLTRFont }
*[dir=rtl] { font-family: myHebrewFont }
(looks like MuPDF does support [attr=value]
)
(our version of) MuPDF
The current version (MuPDF 1.20) behaves the same ftr. :-)
Ok, I got the definition in my preferred font, but how about the word itself that I'm looking up, at the top of the popup. That font doesn't seem to change.
That is part of the UI, unlike the definition which is rendered by MuPDF and that you could easily tweak. So, this word will stick with the font and fallbacks set of the UI - so Hebrew probably comes from our FreeSans font.