citeproc icon indicating copy to clipboard operation
citeproc copied to clipboard

Restrict automatic handling of quotation marks to ASCII " and ' when the "language" CSL field doesn't match the document's "lang" metadata

Open badumont opened this issue 3 years ago • 16 comments

As Pandoc doesn't seem to format the quotation marks according to the lang variable, I have been used to type the french quotation marks (outer « and », inner “ and ”). The only problem with that is that citeproc does format the quotations marks according to the locale, so that my inner quotation marks (“ and ”) were changed to outer quotation marks (« and ») in citations. This occurred in a nested quotation, but it can also alter the rendering of foreign titles whose original typography one could wish or have to retain (like: Noah Yuval-Hacham, “You Shall Not Make for Yourself Any Graven Image...”: On Jewish Iconoclasm in Late Antiquity, Ars Judaica 6).

Would it be possible to restrict marks reformatting to ASCII characters " and ', like citeproc.js does? As far as I know, citeproc does also reformat the “” and ‘’ pairs. If so, with a lang variable set to fr-FR, the string “You Shall Not Make for Yourself Any Graven Image...” would be rendered as is, whereas "You Shall Not Make for Yourself Any Graven Image..." would be rendered as « You Shall Not Make for Yourself Any Graven Image... ».

badumont avatar Feb 06 '21 17:02 badumont

I'd love to get comments from CSL people on how this should be handled. @bdarcus is it correct that citeproc.js just leaves curly quotes alone and doesn't do quote flip-flopping with them? That could lead to some bad results if you have curly quotes in your bibliography file, it seems to me.

jgm avatar Feb 07 '21 02:02 jgm

I don't think so. Straight quotes are what we type by default: if we make an effort to type curly quotes, that should be because we want them. It may be a problem only with bibliographic items retrieved automatically by the Zotero connector provided that we want the punctuation to be normalized, but these often need some cleaning anyway. And if curly quotes are automatically changed, what option is left to force them?

badumont avatar Feb 07 '21 09:02 badumont

I don't really know.

But logically, I don't see why one wouldn't.

Copying @fbennett @bwiernik @adam3smith

bdarcus avatar Feb 07 '21 11:02 bdarcus

A lot of bibliographic metadata contains properly typeset quotes. Manually typing quote marks in data is probably the minority of CSL cases, and many publishers include proper quotes.

So, in sum, citeprocs should normalize all quotes. We have discussed an escape syntax to intervene on hyphen/en dash normalization. I imagine such a syntax could apply here too.

But treating all quote marks the same, regardless of character, is expected. See this example in the test suite https://github.com/citation-style-language/test-suite/blob/master/processor-tests/humans/decorations_NestedQuotes.txt

Edit: Aligned this with Frank's comments below.

bwiernik avatar Feb 07 '21 11:02 bwiernik

Could'nt it be controlled by an attribute on cs:style? Something like normalize-quotation-marks, with accepted values all (default) and straight? If it depends on the publisher, it would be more logical to set it in the style than escaping characters in the bibliographical database.

In the meanwhile, how can we meet the requirements of the minority which retain the original punctuation of the cited item if all marks are converted?

badumont avatar Feb 07 '21 12:02 badumont

In citeproc-js, the quote chars recognized beyond straight-quotes depend on the language field value (in ISO) of the item. So guillmet in a title of an "en" item will not be touched, but for "fr" items, they will be normalized.

fbennett avatar Feb 07 '21 12:02 fbennett

In citeproc-js, the quote chars recognized beyond straight-quotes depend on the language field value (in ISO) of the item. So guillmet in a title of an "en" item will not be touched, but for "fr" items, they will be normalized.

This library has the same behavior (I believe). Only straight quotes and the quotes specified as inner and outer quotes in the locale you're using are normalized; others are kept as they are.

jgm avatar Feb 07 '21 16:02 jgm

(Rereading, I see that that's the premise of the original question. I'm not following well, should sit this one out.)

fbennett avatar Feb 07 '21 19:02 fbennett

@jgm Yes, I think that's a reasonable behavior. For the unusual case that someone wants to prevent proper typesetting of inner/outer quotes, then a future general escaping/literal syntax could apply. I don't think any behavior currently needs to change in this library.

bwiernik avatar Feb 07 '21 20:02 bwiernik

The behavior described for citeproc-js isn't the same. Citeproc-js modifies the marks only in those items whose CSL language field is set to the same locale as the document. This prevents from modifying the curly quotes in references to English works when the document locale is set to fr. The behavior of citeproc, on the other hand, is to ignore the CSL language variable.

Whatever solution you prefer, the pair '« ' and ' »' is not recognized as quotation marks. Here is a MWE:

test.md

---
lang: fr
references:
- type: article-journal
  id: Frenchquotes
  author:
  - family: Doe
    given: John
  title: Le « titre dans le titre ». Étude
  issued: 2020
---

Command: pandoc -t plain --citeproc test.md

Actual ouput: Doe, John. 2020. « Le « titre dans le titre ». Étude ».

Expected output: Doe, John. 2020. « Le “titre dans le titre”. Étude ».

badumont avatar Feb 07 '21 22:02 badumont

OK, thanks for this. I have to look into this, but I suspect the reason the guillemets aren't being recognized is that the locale file has

    <term name="open-quote">«&#160;</term>
    <term name="close-quote">&#160;»</term>

A good experiment would be to remove the &#160; and see if that changes things. (If it does, we'll have to consider how to fix things so this works properly.)

jgm avatar Feb 08 '21 18:02 jgm

I modified both my custom csl style and the locales-fr-FR.xml file bundled with Pandoc, with no success. I also removed the insecable spaces in the item metadata.

badumont avatar Feb 08 '21 18:02 badumont

Another thing to keep in mind is that some languages (e.g. German) have multiple quote variants that can be used interchangeably, but each document should be consistent, of course. Is there a way to take care of this?

denismaier avatar Feb 08 '21 18:02 denismaier

@denismaier The only mechanism provided for quote normalization in CSL is the inner-quote and outer-quote terms, for which a locale can provide one each.

bwiernik avatar Feb 08 '21 18:02 bwiernik

But the only scenario in which this wouldn't solve the multiple quotation variants per language problem is if someone had typographic quotes type 1 in their data and then wrote a text relying on typographic quotes type 2. In that case they'll have to normalize their quotes (e.g. to straight quotes) and the style will convert into whichever form its locale (which can be a built-in locale with type 2 quotes) prescribes.

adam3smith avatar Feb 08 '21 18:02 adam3smith

@adam3smith that makes sense!

denismaier avatar Feb 08 '21 19:02 denismaier