Add arXiv identifier
Summary
Adds an arXiv identifier ARXIV similar to PMID, PMCID and DOI.
Makes it possible to add the recommended form of citing arXiv submissions to a bibliography template with <text variable="ARXIV" prefix="arXiv:"/>.
Context
The arXiv references come in two forms
- pre 2007:
arXiv:hep-th/9603067 - post 2007:
arXiv:2412.11645 [hep-ex]
where the identifier itself can also have a version, e.g. 2412.11645v2.
See also the biblatex manual section "3.14.7 Electronic Publishing Information", which basically says that arXiv submissions are given in their format as
eprint = {identifier},
eprinttype = {arxiv},
eprintclass = {class},
with a few aliases like primaryclass for eprintclass which are already implemented in https://github.com/typst/biblatex/pull/75 but are not yet in the published release.
Related discussions
- https://github.com/typst/hayagriva/issues/302
- https://github.com/typst/typst/discussions/3006
- https://forum.typst.app/t/how-can-i-show-the-arxiv-identifier-in-the-bibliography/3330
- https://discord.com/channels/1054443721975922748/1088371919725793360/1337798216632242186
- https://discord.com/channels/1054443721975922748/1088371919725793360/1111970697837809735
- https://discord.com/channels/1054443721975922748/1088371919725793360/1090915615142846494
The actual change
Before this PR a hayagriva import of a bibtex file ignores the eprintclass and saves only the identifier as a serial number. Neither class or identifier are accessible from the CSL styles. After this PR the class is also added to the arXiv serial number but it still works without the class.
serial-number:
- arxiv: '{identifier} [{class}]'
For the tests I added it to the end of the APS style, which looks like this
[1] R. Aaij others, Test of lepton flavor universality with B^+ arrow K^+ pi^+ pi^- ell^+ ell^- decays, Phys. Rev. Lett. 134, 181803 (2025), arXiv:2412.11645 [hep-ex]. [2] N. Itzhaki, Some remarks on 't Hooft's S-matrix for black holes, (1996), arXiv:hep-th/9603067.
First two commits are the discussed changes and I sneaked in two "cleanup" commits. One deletes an unused file in the tests folder and the other one adds custom error types for the cli.
Fixes https://github.com/typst/hayagriva/issues/302. Depends on https://github.com/typst/biblatex/pull/75 and https://github.com/typst/citationberg/pull/24.
The problem is that the CSL spec has no variable for arXiv and therefore no CSL style will use it. I propose opening an issue on the CSL spec repo as well
The problem is that the CSL spec has no variable for arXiv and therefore no CSL style will use it. I propose opening an issue on the CSL spec repo as well
I opened a PR to add the identifier to their schema as well.
Thank you for the contribution. For the time being, I'll consider this blocked on upstream (CSL schema). I'll note however that I've seen some styles which seemed to use some unofficial CSL variables, including ARXIV (but also identifiers for some less known journals and publishers), which is also why they aren't supported by hayagriva, so it is not unheard of. I would like to see the CSL team's position on the matter first though (if this is intended to be added to a future CSL version or not).
You make a great point for custom bibliography variables and styles. It feels weird that hayagriva has to know of a journal in order to display an entry. There are custom bib styles but one cannot add variables without compiling a custom hayagriva. Maybe it would be nice to add a way to configure a map of serial-number to custom variables.
I think the CSL styles should be a large list of styles that users can depend on for professional style and should never be a constraint. Sounds dramatic but look at this lovely open issue https://github.com/citation-style-language/schema/issues/131 from 2016 where the proposed workaround is using another field (PMID). Or this one https://github.com/citation-style-language/schema/issues/350 from 2020. However, this proposed identifier is exactly what we already have in hayagriva in the serial-number dictionary only that we cannot use the data saved in those fields.
Current write-only implementation of the arXiv serial-number
https://github.com/typst/hayagriva/blob/0c3c7003b989f706296b7a30b743cc4431faf859/docs/file-format.md?plain=1#L324 https://github.com/typst/hayagriva/blob/0c3c7003b989f706296b7a30b743cc4431faf859/CHANGELOG.md?plain=1#L70 https://github.com/typst/hayagriva/blob/0c3c7003b989f706296b7a30b743cc4431faf859/src/lib.rs#L716-L724 https://github.com/typst/hayagriva/blob/0c3c7003b989f706296b7a30b743cc4431faf859/src/interop.rs#L454-L456
I find it highly unlikely that CSL would introduce a new variable just for a single preprint server. Even if it's large.
I find it highly unlikely that CSL would introduce a new variable just for a single preprint server. Even if it's large.
I already linked the discussion on the topic: https://github.com/citation-style-language/schema/issues/350#issue-675718793
They said that they want to put it into a identifier array variable (essentially hayagriva's serial-numbers). However, the CSL schema repository seems abandoned.
@PgBiel
I'll note however that I've seen some styles which seemed to use some unofficial CSL variables, including ARXIV (but also identifiers for some less known journals and publishers), which is also why they aren't supported by hayagriva, so it is not unheard of.
I have given it some thought.
Proposition
What do you think about adding a extra-variables: name -> content key to the hayagriva spec that holds mappings of name -> content and then making the entries available in the templates under variables name maybe with some prefix or even with tag extravariable instead of variable?
This would allow
- user-defined fields available in templates
- using it (as staging area) for not officially CSL supported fields (e.g. for bibtex compatibility)
both without interferring with the CSL spec. We could even go for something longer like custom-extra-variables to make collisions with CSL even less likely and add a warning message if a template definition uses a variable that is not in the official CSL spec.
If you are interested I can throw together a PR.
I can see the value in having something like that, but I would hold off from adding this for now. Hayagriva's format so far has attempted to be fairly independent from CSL itself (there was also a period where it didn't use CSL at all, though it was much more limited then).
I wouldn't discard the idea entirely, but for now, I think we'll want to focus on developing Typst-side solutions allowing full customization of rendered bibliography entries. This PR https://github.com/typst/typst/pull/5932 was the closest we have gotten so far to getting this to work, though some technical challenges made it not progress. I hope we can find a solution to them soon.