citeproc icon indicating copy to clipboard operation
citeproc copied to clipboard

Different processing of CSL JSON and YAML

Open adunning opened this issue 11 months ago • 3 comments

Using pandoc 3.6.1, Unicode superscript characters are passed through as expected with a YAML file, but are modified to use formatted superscripts if the references are stored in JSON.

YAML input

This works as expected:

pandoc -C -t plain << EOT

---
references:
- id: quantinReceptionHuguesRichard2010
  author:
    - family: Quantin
      given: Jean-Louis
  citation-key: quantinReceptionHuguesRichard2010
  collection-number: '22'
  collection-title: Bibliotheca Victorina
  container-title: "L’école de Saint-Victor de Paris: Influence et rayonnement du Moyen Âge à l’Époque moderne"
  DOI: 10.1484/M.BV-EB.3.4428
  editor:
    - family: Poirel
      given: Dominique
  event-place: Turnhout
  ISBN: 978-2-503-53562-3
  issued:
    - year: 2010
  language: fr
  page: 601-642
  publisher: Brepols
  publisher-place: Turnhout
  source: Library of Congress ISBN
  title: "La réception d’Hugues et Richard de Saint-Victor au miroir de leurs *Opera omnia* (XVIᵉ–XVIIᵉ siècles)"
  type: chapter
---

Test [@quantinReceptionHuguesRichard2010].

EOT

Result:

Test (Quantin 2010).

Quantin, Jean-Louis. 2010. “La réception d’Hugues et Richard de
Saint-Victor au miroir de leurs Opera omnia (XVIᵉ–XVIIᵉ siècles).” In
L’école de Saint-Victor de Paris: Influence et rayonnement du Moyen Âge
à l’Époque moderne, edited by Dominique Poirel, 601–42. Bibliotheca
Victorina 22. Turnhout: Brepols. https://doi.org/10.1484/M.BV-EB.3.4428.

The result is the identical with an external YAML references file.

JSON input

Running pandoc superscript-json.md -C -t plain with superscript-json.md and test.json, one instead receives:

Test (Quantin 2010).

Quantin, Jean-Louis. 2010. “La réception d’Hugues et Richard de
Saint-Victor au miroir de leurs Opera omnia (XVI^(e)–XVII^(e) siècles).”
In L’école de Saint-Victor de Paris: Influence et rayonnement du Moyen
Âge à l’Époque moderne, edited by Dominique Poirel, 601–42. Bibliotheca
Victorina 22. Turnhout: Brepols. https://doi.org/10.1484/M.BV-EB.3.4428.

Note the undesirable reformatting of to ^(e).

adunning avatar Jan 09 '25 11:01 adunning

pandoc test.json -f csljson -t native -s

will reveal:

 , Str "(XVI"
                             , Superscript [ Str "e" ]
                             , Str "\8211XVII"
                             , Superscript [ Str "e" ]

So this happens in converting the JSON to a native pandoc AST. I think this happens in citeproc somewhere, but I need to investigate.

jgm avatar Jan 09 '25 16:01 jgm

This transformation happens in Citeproc.CslJson. I'm assuming this was something I had to do to conform to the csl test suite, but I'm not entirely sure.

If your issue is primarily with plain text output, then I could modify the plain text writer to use a unicode character for superscripted 'e' (and 'r' and 'm') -- we do that with numerals already.

jgm avatar Jan 09 '25 16:01 jgm

Thank you – it stems from https://github.com/jgm/citeproc/blob/master/test/csl/magic_SuperscriptChars.txt. The discussion to which it refers seems to have intended this conversion as optional, but I assume it is better not to modify the CSL test suite.

I am hoping to get native Unicode superscripts in other formats, and have added a suggestion for working around this problem at https://github.com/jgm/pandoc/issues/10591.

adunning avatar Feb 02 '25 14:02 adunning

Now that we have the native Unicode superscripts coming through in pandoc, can we close this issue? We are still generating the <sup> tags for CSL JSON, for purposes of the test suite.

jgm avatar Oct 23 '25 21:10 jgm

In practice, yes, thank you, I think this can be closed!

adunning avatar Nov 15 '25 22:11 adunning