Different processing of CSL JSON and YAML
Using pandoc 3.6.1, Unicode superscript characters are passed through as expected with a YAML file, but are modified to use formatted superscripts if the references are stored in JSON.
YAML input
This works as expected:
pandoc -C -t plain << EOT
---
references:
- id: quantinReceptionHuguesRichard2010
author:
- family: Quantin
given: Jean-Louis
citation-key: quantinReceptionHuguesRichard2010
collection-number: '22'
collection-title: Bibliotheca Victorina
container-title: "L’école de Saint-Victor de Paris: Influence et rayonnement du Moyen Âge à l’Époque moderne"
DOI: 10.1484/M.BV-EB.3.4428
editor:
- family: Poirel
given: Dominique
event-place: Turnhout
ISBN: 978-2-503-53562-3
issued:
- year: 2010
language: fr
page: 601-642
publisher: Brepols
publisher-place: Turnhout
source: Library of Congress ISBN
title: "La réception d’Hugues et Richard de Saint-Victor au miroir de leurs *Opera omnia* (XVIᵉ–XVIIᵉ siècles)"
type: chapter
---
Test [@quantinReceptionHuguesRichard2010].
EOT
Result:
Test (Quantin 2010).
Quantin, Jean-Louis. 2010. “La réception d’Hugues et Richard de
Saint-Victor au miroir de leurs Opera omnia (XVIᵉ–XVIIᵉ siècles).” In
L’école de Saint-Victor de Paris: Influence et rayonnement du Moyen Âge
à l’Époque moderne, edited by Dominique Poirel, 601–42. Bibliotheca
Victorina 22. Turnhout: Brepols. https://doi.org/10.1484/M.BV-EB.3.4428.
The result is the identical with an external YAML references file.
JSON input
Running pandoc superscript-json.md -C -t plain with superscript-json.md and
test.json, one instead receives:
Test (Quantin 2010).
Quantin, Jean-Louis. 2010. “La réception d’Hugues et Richard de
Saint-Victor au miroir de leurs Opera omnia (XVI^(e)–XVII^(e) siècles).”
In L’école de Saint-Victor de Paris: Influence et rayonnement du Moyen
Âge à l’Époque moderne, edited by Dominique Poirel, 601–42. Bibliotheca
Victorina 22. Turnhout: Brepols. https://doi.org/10.1484/M.BV-EB.3.4428.
Note the undesirable reformatting of ᵉ to ^(e).
pandoc test.json -f csljson -t native -s
will reveal:
, Str "(XVI"
, Superscript [ Str "e" ]
, Str "\8211XVII"
, Superscript [ Str "e" ]
So this happens in converting the JSON to a native pandoc AST. I think this happens in citeproc somewhere, but I need to investigate.
This transformation happens in Citeproc.CslJson. I'm assuming this was something I had to do to conform to the csl test suite, but I'm not entirely sure.
If your issue is primarily with plain text output, then I could modify the plain text writer to use a unicode character for superscripted 'e' (and 'r' and 'm') -- we do that with numerals already.
Thank you – it stems from https://github.com/jgm/citeproc/blob/master/test/csl/magic_SuperscriptChars.txt. The discussion to which it refers seems to have intended this conversion as optional, but I assume it is better not to modify the CSL test suite.
I am hoping to get native Unicode superscripts in other formats, and have added a suggestion for working around this problem at https://github.com/jgm/pandoc/issues/10591.
Now that we have the native Unicode superscripts coming through in pandoc, can we close this issue?
We are still generating the <sup> tags for CSL JSON, for purposes of the test suite.
In practice, yes, thank you, I think this can be closed!