mystmd icon indicating copy to clipboard operation
mystmd copied to clipboard

tex export strips `\&` from bibliography fields

Open minrk opened this issue 1 year ago • 5 comments

Description

Given the .bib entry:

@article{test,
    author={Last Name, First},
    journal={Computing in Science {\&} Engineering},
    title={Thing \& Other thing},
    year={3048},
    volume={1},
    number={1},
    pages={1-2},
    keywords={},
}

building tex/pdf with myst build --tex or pdf generates the bibtex entry in exports/tex/main.bib:

@article{test,
	author = {Last Name, First},
	journal = {Computing in Science & Engineering},
	number = {1},
	year = {3048},
	pages = {1--2},
	title = {Thing & {Other} thing},
	volume = {1},
}

resulting in errors like: "Misplaced alignment tab character &." in the latex output.

Running a search through npx mystmd@$version suggests that this is a regression in [email protected]:

rm -rf _build exports && npx [email protected] build --tex && cat exports/tex/main.bib

produces the right output, while

rm -rf _build exports && npx [email protected] build --tex && cat exports/tex/main.bib

strips the escape characters.

Proposed solution

preserve characters like \& in bibliography fields

Additional notes

this happens with [email protected] and [email protected], but not [email protected].

minrk avatar Aug 13 '24 10:08 minrk

Thank you for tracking down this regression!

rowanc1 avatar Aug 13 '24 18:08 rowanc1

In that release we started generating bibtex from CSL-JSON using citation-js, rather than just copying in the raw source bibtex. This solution was more generic and allowed us to support citations (e.g. from DOIs) that did not have raw bibtex available. However, it has led to some issues, since CSL-JSON (at least as implemented in citation-js) is lossy and incomplete, compared to relatively permissive and feature-rich bibtex, e.g. see: https://github.com/jupyter-book/mystmd/issues/1284

I'm not quite sure the right approach to address this. We could return to persisting raw bibtex, if available, and only generating bibtex if raw is not available. The drawbacks of this are: (1) Raw bibtex is only available on a private field hidden away in the citation-js api; accessing it feels a little shaky. (2) It's never nice to maintain two ways of doing the same thing. (3) Sometimes we need to modify bibtex ids, e.g. if there are duplicates; with raw bibtex, this becomes fragile string manipulation rather than simply updating structured data.

The other option is improve the bibtex rendering coming out of citation-js. To address the specific issue around escaped characters, we could maybe just escape fields before we call format here https://github.com/jupyter-book/mystmd/blob/main/packages/citation-js-utils/src/index.ts#L327 ...? Or we may need our own CSL -> bibtex rendering outside of citation-js... This could take advantage of other bibtex js libraries, there are a ton, but it's hard to know what's good...

fwkoch avatar Aug 13 '24 18:08 fwkoch

Thanks for the pointer. This is easy to reproduce as an upstream bug in citation-js, so we can hope it gets handled there: https://github.com/citation-js/citation-js/issues/232

They do have some formatting code for bibtex export, so it seems handling this is in-scope for citation-js already, it just hasn't come up yet.

If a workaround is appropriate, I suppose mystmd could apply some of its own escaping to the CSL before passing it to the bibtex exporter, assuming it won't double-escape (at least with a pinned version). I don't know how robust that can be, though.

minrk avatar Aug 13 '24 21:08 minrk

https://github.com/citation-js/citation-js/issues/232 is fixed upstream, so next update should close this particular issue.

minrk avatar Aug 14 '24 15:08 minrk

Thanks @minrk for following this upstream. :)

rowanc1 avatar Aug 14 '24 15:08 rowanc1

Any update on this bug since it is fixed upstream ?

honnorat avatar Jun 19 '25 16:06 honnorat