luaotfload
luaotfload copied to clipboard
soft hyphen problems
The soft hyphen (U+00AD) has some problem.
\documentclass{article}
\usepackage{l3pdf,pdfresources}
\ExplSyntaxOn
\pdf_uncompress:
\ExplSyntaxOff
\usepackage{fontspec}
\setmainfont{TeX Gyre Termes}[Renderer=HarfBuzz]
\setsansfont{Arial}[Renderer=HarfBuzz]
\begin{document}
not^^2dcopy^^2dable
not^^adcopy^^adable
\sffamily
not^^2dcopy^^2dable
not^^adcopy^^adable
\end{document}
Without harfbuzz,
U+00AD seems to be not a hyphenation point (that looks like a problem) but beside this the output looks ok:
With harfbuzz
the output is rather weird
should that be dealt with in luaotfload or should TU encoding set that up as active equivalent to \-
?
@davidcarlisle I think the first issue should be fixed by making ^^ad active and \let to -, but the issue with HarfBuzz needs a fix in luaotfload.
The harf problem should be fixed in dev, but I'm a bit worried about the general behaviour. In both harf and node the soft hyphen disappears (because it's default ignorable). But I do not think that this is what we want: Especially when ^^ad gets mapped to an equivalent of \discretionary{\char"AD}{}{}
(which would give more semantic PDF files), we really want to keep the soft hyphen. In node we could archive this by not treating AD as default ignorable, but in harf we don't really have enough control over that.
Well if there is a ^^ad
in the user input, it probably can be ignored or mapped to \-
.
But what we will want in the end is that we get something like this in the pdf when hyphenation is involved:
Nordrhein- %<-- visible hard hyphen in the pdf
West- %<-- visible soft hyphen in the pdf (or hard hyphen with soft hyphen accsupp?)
falen
so that it copies&paste back as Nordrhein-Westfalen.
Well if there is a
^^ad
in the user input, it probably can be ignored or mapped to\-
.
That's actually an interesting question: Should it be ignored or mapped to \-
. After all, \-
exactly represents the usecase for U+AD so mapping it seems to be the obvious choice. On the other hand, every time I've actually encountered someone using U+AD it was by accident, so ignoring it might help to prevent very hard to find errors. In any case, that should be done in the kernel/inputenc.
But what we will want in the end is that we get something like this in the pdf when hyphenation is involved:
Nordrhein- %<-- visible hard hyphen in the pdf West- %<-- visible soft hyphen in the pdf (or hard hyphen with soft hyphen accsupp?) falen
so that it copies&paste back as Nordrhein-Westfalen.
I agree and I think the best way to do this is to set the prehyphenchar to "AD so that LuaTeX inserts the correct one automatically. That's why I think that we shouldn't treat it as default-ignorable in luaotfload: If it hasn't been intercepted on the TeX level, it probably should become an actual character in the PDF.
The question is how to implement this correctly, especially for harf. Probably we have to tell HarfBuzz to always preserve default ignorable characters and then filter them out by ourselves to ensure that we can apply that kind of customization to the process. (That would also fit very well into some plans I have for ZWNJ) This could also unify the codepaths for HarfBuzz and the fontloader, but it might cause some issues for fonts which do not contain the relevant glyphs.