citeproc
citeproc copied to clipboard
Restrict automatic handling of quotation marks to ASCII " and ' when the "language" CSL field doesn't match the document's "lang" metadata
As Pandoc doesn't seem to format the quotation marks according to the lang
variable, I have been used to type the french quotation marks (outer « and », inner “ and ”). The only problem with that is that citeproc does format the quotations marks according to the locale, so that my inner quotation marks (“ and ”) were changed to outer quotation marks (« and ») in citations. This occurred in a nested quotation, but it can also alter the rendering of foreign titles whose original typography one could wish or have to retain (like: Noah Yuval-Hacham, “You Shall Not Make for Yourself Any Graven Image...”: On Jewish Iconoclasm in Late Antiquity, Ars Judaica 6).
Would it be possible to restrict marks reformatting to ASCII characters "
and '
, like citeproc.js does? As far as I know, citeproc does also reformat the “” and ‘’ pairs. If so, with a lang
variable set to fr-FR
, the string “You Shall Not Make for Yourself Any Graven Image...”
would be rendered as is, whereas "You Shall Not Make for Yourself Any Graven Image..."
would be rendered as « You Shall Not Make for Yourself Any Graven Image... ».
I'd love to get comments from CSL people on how this should be handled. @bdarcus is it correct that citeproc.js just leaves curly quotes alone and doesn't do quote flip-flopping with them? That could lead to some bad results if you have curly quotes in your bibliography file, it seems to me.
I don't think so. Straight quotes are what we type by default: if we make an effort to type curly quotes, that should be because we want them. It may be a problem only with bibliographic items retrieved automatically by the Zotero connector provided that we want the punctuation to be normalized, but these often need some cleaning anyway. And if curly quotes are automatically changed, what option is left to force them?
I don't really know.
But logically, I don't see why one wouldn't.
Copying @fbennett @bwiernik @adam3smith
A lot of bibliographic metadata contains properly typeset quotes. Manually typing quote marks in data is probably the minority of CSL cases, and many publishers include proper quotes.
So, in sum, citeprocs should normalize all quotes. We have discussed an escape syntax to intervene on hyphen/en dash normalization. I imagine such a syntax could apply here too.
But treating all quote marks the same, regardless of character, is expected. See this example in the test suite https://github.com/citation-style-language/test-suite/blob/master/processor-tests/humans/decorations_NestedQuotes.txt
Edit: Aligned this with Frank's comments below.
Could'nt it be controlled by an attribute on cs:style
? Something like normalize-quotation-marks
, with accepted values all
(default) and straight
? If it depends on the publisher, it would be more logical to set it in the style than escaping characters in the bibliographical database.
In the meanwhile, how can we meet the requirements of the minority which retain the original punctuation of the cited item if all marks are converted?
In citeproc-js, the quote chars recognized beyond straight-quotes depend on the language field value (in ISO) of the item. So guillmet in a title of an "en" item will not be touched, but for "fr" items, they will be normalized.
In citeproc-js, the quote chars recognized beyond straight-quotes depend on the language field value (in ISO) of the item. So guillmet in a title of an "en" item will not be touched, but for "fr" items, they will be normalized.
This library has the same behavior (I believe). Only straight quotes and the quotes specified as inner and outer quotes in the locale you're using are normalized; others are kept as they are.
(Rereading, I see that that's the premise of the original question. I'm not following well, should sit this one out.)
@jgm Yes, I think that's a reasonable behavior. For the unusual case that someone wants to prevent proper typesetting of inner/outer quotes, then a future general escaping/literal syntax could apply. I don't think any behavior currently needs to change in this library.
The behavior described for citeproc-js isn't the same. Citeproc-js modifies the marks only in those items whose CSL language
field is set to the same locale as the document. This prevents from modifying the curly quotes in references to English works when the document locale is set to fr
. The behavior of citeproc, on the other hand, is to ignore the CSL language
variable.
Whatever solution you prefer, the pair '« ' and ' »' is not recognized as quotation marks. Here is a MWE:
test.md
---
lang: fr
references:
- type: article-journal
id: Frenchquotes
author:
- family: Doe
given: John
title: Le « titre dans le titre ». Étude
issued: 2020
---
Command:
pandoc -t plain --citeproc test.md
Actual ouput:
Doe, John. 2020. « Le « titre dans le titre ». Étude ».
Expected output:
Doe, John. 2020. « Le “titre dans le titre”. Étude ».
OK, thanks for this. I have to look into this, but I suspect the reason the guillemets aren't being recognized is that the locale file has
<term name="open-quote">« </term>
<term name="close-quote"> »</term>
A good experiment would be to remove the  
and see if that changes things. (If it does, we'll have to consider how to fix things so this works properly.)
I modified both my custom csl style and the locales-fr-FR.xml file bundled with Pandoc, with no success. I also removed the insecable spaces in the item metadata.
Another thing to keep in mind is that some languages (e.g. German) have multiple quote variants that can be used interchangeably, but each document should be consistent, of course. Is there a way to take care of this?
@denismaier The only mechanism provided for quote normalization in CSL is the inner-quote and outer-quote terms, for which a locale can provide one each.
But the only scenario in which this wouldn't solve the multiple quotation variants per language problem is if someone had typographic quotes type 1 in their data and then wrote a text relying on typographic quotes type 2. In that case they'll have to normalize their quotes (e.g. to straight quotes) and the style will convert into whichever form its locale (which can be a built-in locale with type 2 quotes) prescribes.
@adam3smith that makes sense!