mystmd
mystmd copied to clipboard
tex export strips `\&` from bibliography fields
Description
Given the .bib entry:
@article{test,
author={Last Name, First},
journal={Computing in Science {\&} Engineering},
title={Thing \& Other thing},
year={3048},
volume={1},
number={1},
pages={1-2},
keywords={},
}
building tex/pdf with myst build --tex or pdf generates the bibtex entry in exports/tex/main.bib:
@article{test,
author = {Last Name, First},
journal = {Computing in Science & Engineering},
number = {1},
year = {3048},
pages = {1--2},
title = {Thing & {Other} thing},
volume = {1},
}
resulting in errors like: "Misplaced alignment tab character &." in the latex output.
Running a search through npx mystmd@$version suggests that this is a regression in [email protected]:
rm -rf _build exports && npx [email protected] build --tex && cat exports/tex/main.bib
produces the right output, while
rm -rf _build exports && npx [email protected] build --tex && cat exports/tex/main.bib
strips the escape characters.
Proposed solution
preserve characters like \& in bibliography fields
Additional notes
this happens with [email protected] and [email protected], but not [email protected].
Thank you for tracking down this regression!
In that release we started generating bibtex from CSL-JSON using citation-js, rather than just copying in the raw source bibtex. This solution was more generic and allowed us to support citations (e.g. from DOIs) that did not have raw bibtex available. However, it has led to some issues, since CSL-JSON (at least as implemented in citation-js) is lossy and incomplete, compared to relatively permissive and feature-rich bibtex, e.g. see: https://github.com/jupyter-book/mystmd/issues/1284
I'm not quite sure the right approach to address this. We could return to persisting raw bibtex, if available, and only generating bibtex if raw is not available. The drawbacks of this are: (1) Raw bibtex is only available on a private field hidden away in the citation-js api; accessing it feels a little shaky. (2) It's never nice to maintain two ways of doing the same thing. (3) Sometimes we need to modify bibtex ids, e.g. if there are duplicates; with raw bibtex, this becomes fragile string manipulation rather than simply updating structured data.
The other option is improve the bibtex rendering coming out of citation-js. To address the specific issue around escaped characters, we could maybe just escape fields before we call format here https://github.com/jupyter-book/mystmd/blob/main/packages/citation-js-utils/src/index.ts#L327 ...? Or we may need our own CSL -> bibtex rendering outside of citation-js... This could take advantage of other bibtex js libraries, there are a ton, but it's hard to know what's good...
Thanks for the pointer. This is easy to reproduce as an upstream bug in citation-js, so we can hope it gets handled there: https://github.com/citation-js/citation-js/issues/232
They do have some formatting code for bibtex export, so it seems handling this is in-scope for citation-js already, it just hasn't come up yet.
If a workaround is appropriate, I suppose mystmd could apply some of its own escaping to the CSL before passing it to the bibtex exporter, assuming it won't double-escape (at least with a pinned version). I don't know how robust that can be, though.
https://github.com/citation-js/citation-js/issues/232 is fixed upstream, so next update should close this particular issue.
Thanks @minrk for following this upstream. :)
Any update on this bug since it is fixed upstream ?