polyglossia icon indicating copy to clipboard operation
polyglossia copied to clipboard

bidi disables all hyphenation in \text… and \foreignlanguage

Open logological opened this issue 1 year ago • 9 comments

With TeX Live 2024 updated to 2024-06-14, using the bidi package (or invoking \setotherlanguage with any language that loads bidi) disables hyphenation in the output of the \text.… and \foreignlanguage commands. Hyphenation still works in \begin{language}…\end{language} environments.

Here's an example file demonstrating the problem:

\documentclass{article}

\usepackage{polyglossia}
\setdefaultlanguage{german}
\setotherlanguage{persian} % or just \usepackage{bidi}

\begin{document}
% Hyphenation works outside of any command or environment:
\parbox{0pt}{\hspace{0pt}Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz}

% Hyphenation works in a {language} environment:
\parbox{0pt}{\hspace{0pt}\begin{german}Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz\end{german}}

% Hyphenation doesn't work in \foreignlanguage:
\parbox{0pt}{\hspace{0pt}\foreignlanguage{german}{Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz}} 

% Hyphenation doesn't work in \text…:
\parbox{0pt}{\hspace{0pt}\textgerman{Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz}}

\end{document}

test.pdf test.log

The discussion in latex3/latex2e#1368 indicates that a great many packages stopped working in conjunction with bidi following a recent update to the array package. Perhaps this is yet another example. I will also report this issue to the developer of bidi, though they do not seem to have been active in the last eight months.

logological avatar Jun 18 '24 01:06 logological

Thanks for the report. This seems to be a bug in XeTeX. Here is a more minimal example demonstrating the problem

%\font\foo="[lmroman10-regular]" at 10pt\foo
\TeXXeTstate=1
\vbox{\hsize=0pt\hskip0pt Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz}
\vskip10pt
% Hyphenation works in a {language} environment:
\vbox{\hsize=0pt\hskip0pt\beginL Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz\endL}
\bye

If I uncomment the first line (i.e. load a modern font) the bug occurs. The only thong we can do on polyglossia's side is to prevent the use of \beginL etc. when not needed, but it will not solve the problem in cases where such phrases are used inside RTL paragraph.

Udi-Fogiel avatar Jun 18 '24 02:06 Udi-Fogiel

I forgot to mention. I can reproduce the bug with older TL so it is probably not new, nor related to the latest update.

Udi-Fogiel avatar Jun 18 '24 02:06 Udi-Fogiel

Reported at https://sourceforge.net/p/xetex/bugs/202/

Udi-Fogiel avatar Jun 18 '24 03:06 Udi-Fogiel

Thanks for investigating. I'll follow the XeTeX bug report.

logological avatar Jun 18 '24 13:06 logological

I've pushed a commit to the branch I'm working on (named udi). Your example work there, but as I said, if the direction change is really needed then the problem still exists (for example if persian was the main language in your example).

As mentioned by Jonathan Kew in the bug report, you can append \hskip0pt to the end of the text, I'm currently not sure if that will have other side effects, so for now will not add that to the package, but it might get added in the future.

@jspitz @reutenauer do you know if adding \hskip0pt to such cases can have an affect such as adding a possible line break and thus move the \endL/\endR node to a new line which can cause some changes to the output?

Udi-Fogiel avatar Jun 18 '24 13:06 Udi-Fogiel

As mentioned by Jonathan Kew in the bug report, you can append \hskip0pt to the end of the text, I'm currently not sure if that will have other side effects, so for now will not add that to the package, but it might get added in the future.

I can confirm that adding \hskip0pt to the end of the text has the undesirable feature that it allows a line break at that point. Such a line break might be undesirable if the \foreignlanguage command is followed by something like punctuation: with, say, \foreignlanguage{greek}{Ψυχοφθόρα}, I do not want TeX to break the line before the comma.

logological avatar Jun 21 '24 21:06 logological

\kern0pt instead of \hskip0pt could work.

u-fischer avatar Jun 22 '24 09:06 u-fischer

Yes, \kern0pt seems to work. I get the hyphenation without any unwanted line breaks.

logological avatar Jun 22 '24 22:06 logological

@u-fischer Thanks Ulrike. Just to be sure, could this lead to changes in rare cases where lastnode commands (e.g \lastskip) are used just after a language skip? If so well probably just have to add a hook or a new option.

Udi-Fogiel avatar Jul 02 '24 21:07 Udi-Fogiel

I believe adding a zero width kern can cause some problems, so shouldn't be done always. Since this is not our bug, and a user can easily define a macro that add a kern, I'll close for now.

Udi-Fogiel avatar Jul 16 '24 19:07 Udi-Fogiel