pandoc
pandoc copied to clipboard
french typography rules not respected
Explain the problem.
It is related to https://github.com/jgm/pandoc/issues/7976. When typing latex, I'm used to use babel this way: \usepackage[french]{babel}
. With this, the typography rules (most notably thin nonbreaking space before ":;!?") are respected. The default template set \usepackage[bidi=default]{babel}
and then \babelprovide[main,import]{french}
, which for an unknown reason results in no space before ":;!?". Replacing \usepackage[bidi=default]{babel}
by \usepackage[french,bidi=default]{babel}
in the default template fixes the issue...
Pandoc version?
pandoc 2.17.1.1, debian sid
Looking at the babel website, it seems the correct way to set a language is in \usepackage{babel}
options, isn't it? Or maybe there is some problem in babel-french where it only recognize this way to set a locale?
https://latex3.github.io/babel/guides/which-method-for-which-language.html https://latex3.github.io/babel/guides/using-babelprovide-to-modify-or-extend-locales.html "All the examples assume: \usepackage[english]{babel}"
I don't know. We changed to use this method of setting the babel language in commit https://github.com/jgm/pandoc/pull/7605/commits/3d8f0110042dba2db495fa96e77570ec5eca4c8b @hseg can you comment on this issue?
Without \usepackage[french]{babel}
, babel-french does not seem to be enabled at all: If I edit the default template to use any babel-french command after \babelprovide[main,import]{french}
, it fails miserably.
It does have some effect: e.g. \tableofcontents
gives you "Table des matières." Can you say specifically what is missing, besides the punctuation spacing you noted? The special treatment of ;
, ?
, and !
is disabled by the following lines in the default template:
% get rid of language-specific shorthands (see #6817):
\let\LanguageShortHands\languageshorthands
\def\languageshorthands#1{}
For background on that, see #6817.
However, it looks like commenting out these lines is not enough by itself to make the punctuation spacing work; one also has to add french
to \usepackage{babel}
.
Possibly helpful: https://latex3.github.io/babel/guides/which-method-for-which-language.html
This answer uses the method we use (including for latin alphabet languages): https://tex.stackexchange.com/questions/513618/howto-multilanguage-babel
This is also informative: https://tex.stackexchange.com/questions/549682/why-can-arabic-not-be-set-as-the-main-language-with-babel-in-the-package-declara/549712#549712
Am away from my dev box, so can only answer based on looking at manuals.
Note that the provide=*
option is equivalent to a \babelprovide
command for the main language (see 1.13).
Also, note the form of language import I implemented forcibly imports the relevant .ini
file (instead of the .ldf
?). I suspect this might be the issue - iiuc, the latter is the more fully-featured definition.
We could try to pass the languages as multiline package options to the initial babel import, unsure if that'd regress my original problems. If doing this, note main language is the last one.
Ah, and I see you've provided the necessary clarification on ini vs ldf 😅.
unsure if that'd regress my original problems.
@hseg do you have a test case we could use to check this?
The file I used is on my dev box, so no. Do recall offhand the cases I checked, though:
- basic embedding (ie getting rtl1 rtl2 LTR4 LTR3 rtl5 rtl6)
- weak directionality - eg numbers, parens at directionality boundaries
- especially for the above, consider international phone numbers at the end of an ltr context - the plus sign should be at left edge https://www.w3.org/International/articles/inline-bidi-markup/bidi_examples might have more cases worth checking I see from my original patch that I tried Greek, but don't recall what I tested that with
Am 8. September 2022 06:40:36 GMT+03:00 schrieb John MacFarlane @.***>:
unsure if that'd regress my original problems.
@.*** do you have a test case we could use to check this?
-- Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/8283#issuecomment-1240179015 You are receiving this because you were mentioned.
Message ID: @.***>
It does have some effect: e.g.
\tableofcontents
gives you "Table des matières." Can you say specifically what is missing, besides the punctuation spacing you noted? The special treatment of;
,?
, and!
is disabled by the following lines in the default template:% get rid of language-specific shorthands (see #6817): \let\LanguageShortHands\languageshorthands \def\languageshorthands#1{}
For background on that, see #6817. However, it looks like commenting out these lines is not enough by itself to make the punctuation spacing work; one also has to add
french
to\usepackage{babel}
.
With \usepackage[french]{babel}
I can at least get the correct punctuation with shorthand: off
or by using xelatex. But, yes, that's suboptimal. A french pandoc user should get correct french typography rules by default, the same way he does with latex :-)
When I saw that the punctuation was not respected I tried using \frenchsetup{}
in the default template, which is used to customize babel-french behaviour according to the doc. Without \usepackage[french]{babel}
I get !Undefined control sequence. ... \frenchsetup
Let me summarize my current understanding of the sitution. Babel is in the process of moving from the old .ldf language definitions (which would be loaded by \usepackage[french]{babel}
and the new .ini definitions (which are loaded by \babelprovide[main,import]{french}
. This transition is needed because the .ldf definitions don't work well when you combine multiple languages in a document. For many languages (e.g. Hebrew) you really need the .ini definitions. For others, perhaps (e.g. French) the old .ldf definitions may be more full-featured. I would like to know more about this, though. Is there a reason the punctuation spacing is not supported in the French .ini definition? Is this something that is going to be added, or has it been removed for some reason?
What should pandoc do, if anything? Well, we could move to a more complicated system in which \usepackage[langs]{babel}
is used for certain languages and \babelprovide
for others. I actually don't know if this would cause problems, nor do I know how we'd determine which languages to treat in which way. Note that, even if we did this, the punctuation spacing wouldn't work unless we also removed the lines disabling shorthands (see #6817 -- as noted in that issue, shorthand:off
wasn't sufficient to deal with the issue there). This would cause other problems. Perhaps there is a way to keep the special treatment of punctuation while disabling other shorthands?
In the mean time, your best bet is to create a custom template that modifies the default template just slightly, changing the line
\usepackage[bidi=default$if(babel-lang)$,$babel-lang$$endif$]{babel}
and removing the lines
$if(babel-lang)$
\babelprovide[main,import]{$babel-lang$}
$endif$
as well as
% get rid of language-specific shorthands (see #6817):
\let\LanguageShortHands\languageshorthands
\def\languageshorthands#1{}
You can specify the modified template on the command line with --template
, or make it a local default by moving it to the templates subdirectory of your user data directory.
I'd be curious to get feedback from other French-speaking pandoc users on the impact of this issue.
PS. Another possible approach is to create a small Lua filter that does the needed typographical corrections. An advantage of this approach is that it will work for ALL output formats, not just LaTeX.
A bit of searching showed that someone has already done this: https://github.com/InseeFrLab/pandoc-filter-fr-nbsp
A bit of searching showed that someone has already done this: https://github.com/InseeFrLab/pandoc-filter-fr-nbsp
It's not quite the same as what babel does, but it's a start.
For example, in French there's always a non-breakable space before a colon : (and ; ? !)
But in a LaTeX it's annoying to have to type them each time. So, babel will convert Bonjour!
to Bonjour~!
.
The Lua filter you found expects you already typed Bonjour !
(space before the punctuation). At least the test file appears that way.
It shouldn't be too hard to improve the script along those lines.
Indeed. I forked that and got something working more like how babel-french works. See https://github.com/fuhrmanator/pandoc-filter-fr-nbsp/blob/main/fr-nbsp.lua
I've used it on a book project and the LaTeX (PDF) and HTML versions look OK. I added more tests, supporting occurrences of high punctuation after quotes and citations, for example. There are surely more cases to consider, but it's a start.
There are some other typography rules that babel-french does, but I'm not sure how to handle in a filter. For example, a caption in English might be Figure 5.1: blah blah
but in French it should use a -
instead of :
.
How does one change this kind of typography inside a filter (is it possible)?
Also, I relied on a hack of pandoc-quotes.lua
to put non-breaking spaces before and after the guillemets «
and »
, mostly because I am lazy and didn't feel like reinventing that wheel (I just added the non-breaking spaces into the strings in the conversion table).
Handling all of these things together in one filter seems like it would cover a broad spectrum (many responsibilities), and so some thought needs to go into a good design (I'm not experienced enough with pandoc there). Are there similar Swiss Army Knife filters, or is this goal out of the scope of their design (do many separate filters)?
How does one change this kind of typography inside a filter (is it possible)?
Depends on the output format. If you're going to LaTeX/PDF, then the text "Figure 5.1:" is supplied by LaTeX itself, not pandoc, and if you set your lang
to fr-FR
it should be appropriately localized.
If you're going to other output formats, then the text may be added by pandoc itself and there is not much you can do.
I think doing it all in one filter makes sense, as it's probably more efficient.
I mostly used pandoc in french and had a lot of troubles with the introduction the \babelprovide
in the default latex template (which by the way is not in sync at all (at least for pandoc 3.1.2 at this date) with the one in the pandoc-templates repository). I am for instance no more able to setup the french part of babel or having dashes used for list item instead of bullets.
My only solution is to have a modified latex template (which is a real mess because I need to follow modifications in default template for each pandoc new version).
Daniel Flipo (maintainer of french-babel) answers to one of my questions (https://tex.stackexchange.com/questions/671407/why-frenchsetup-is-no-more-defined-when-using-babelprovide) and gives me some hints about the modification in babel which I translates below in case it can help fixing this issue.
At the beginning babel was created (in the 1990s) by Johannes Braams. It used language files suffixed by
.ldf
.Next (2013) Javier Bezos took over the maintenance of babel which was abandonned since years. He first corrected bugs in its kernel, then wanted to extend the system to non-european languages (arabic, hebrew, indian ones, etc.) in order to catch up (and go beyond) the polyglossia system on that point. The idea was to localize babel (in the Unix locales understanding). He used files suffixed by
.ini
.Of course he also supplied
.ini
files for european languages which already had.ldf
ones. In the documentation he specify explicitly that for languages having both configuration files the.ldf
must be privilegied.That's the case for french.
.ini
file translate correctly dates,\chapter
, etc. but do not deal at all about typography. High typography must be managed manually. For instance one has to explicitly insert~
in front of each:
and a\,
in front of each!
,?
or;
.So using
.ini
files (through\babelprovide[main,import]{..}
) should be reserved to non-european languages.
As a workaround, you can pass french
as a classoption
, which is inherited by all packages, including babel
. Here is my setup:
---
lang: fr-FR
classoption: french
header-includes: |
\let\languageshorthands\LanguageShortHands
---
Est-ce que ça fonctionne? Oui!