acl-style-files icon indicating copy to clipboard operation
acl-style-files copied to clipboard

arXiv.org citations missing info

Open postylem opened this issue 2 years ago • 8 comments

Expeted output: The ACL formatting requirements give the following example for how to cite an eprint on arXiv:

Mohammad Sadegh Rasooli and Joel R. Tetreault. 2015. Yara parser: A fast and accurate dependency parser. Computing Research Repository, arXiv:1503.06733. Version 2.

Actual output:

The latex template provided in this repo (using the bibtex citation that arXiv provides) gives the following:

Mohammad Sadegh Rasooli and Joel R. Tetreault. 2015. Yara parser: A fast and accurate dependency parser.

It omits the eprint prefix and eprint code.

This seems not good, since the resulting reference has no mention of where the citation is from (without following the hyperlink).


Details:

If one copies the autogenerated bibtex citation from the "Export Bibtex Citation" link on arXiv, one gets the following:

@misc{arxiv.1503.06733,
  doi = {10.48550/ARXIV.1503.06733},
  url = {https://arxiv.org/abs/1503.06733},
  author = {Rasooli, Mohammad Sadegh and Tetreault, Joel},
  title = {Yara Parser: A Fast and Accurate Dependency Parser},
  publisher = {arXiv},
  year = {2015}
}

I notice the bibliography style file acl_natbib.bst has specific code to deal with arXiv references:

https://github.com/acl-org/acl-style-files/blob/2445635ea39619b1fcf6eade762d518bee0d3af6/latex/acl_natbib.bst#L259-L260

but, for some reason (I can't grok .bst files well enough to understand why) the output doesn't comply with the recommended format from the formatting requirements. Instead, using the provided latex template and bst file, the resulting PDF output just has the name, date, and hyperlinked title.

Adding a note field to the bibtex entry, like

note = {Computing Research Repository, {arXiv}:1503.06733. Version 2},

would make the output like the recommended, but it seems like the style sheet should do something like this (at least it should include the {arXiv}:1503.06733 part in the output, I think). Is there a reason why the bst style sheet doesn't do something like this by default?

postylem avatar May 17 '22 12:05 postylem

Thanks for reporting. This isn’t quite my wheelhouse, tagging @nschneid @davidweichiang @danielgildea who might have an idea.

mjpost avatar May 17 '22 13:05 mjpost

I don't know who made the current .bst file. Need to rerun Merlin?

nschneid avatar May 17 '22 14:05 nschneid

Implementing this would require logic to recognize a bibtex entry generated by arxiv and do a bunch of special handling. This isn't standard thing for bst styles to do. I don't know how we could do it without breaking people's existing bibtex files, and without breaking the ability to use the same bibtex file for acl submissions and other bst styles.

danielgildea avatar May 17 '22 14:05 danielgildea

It might be relevant (though not necessarily helpful) to point out that arXiv has changed the format of their autogenerated bibtex records...

It used to be in a format like:

@misc{arxiv:1907.10597-oldformat,
    author = {Roy Schwartz and Jesse Dodge and Noah A. Smith and Oren Etzioni},
    title = {Green AI},
    archiveprefix = {arXiv},
    eprint = {1907.10597},
    primaryclass = {cs.CY},
    year = {2019}}

in the new (current) format it includes a DOI and URL, but does not include those archivePrefix, eprint, and primaryClass fields, which would make the suggested formatting much easier with a bst style...

@misc{arxiv:1907.10597-newformat,
  author = {Schwartz, Roy and Dodge, Jesse and Smith, Noah A. and Etzioni, Oren},
  title = {Green AI},
  doi = {10.48550/ARXIV.1907.10597},
  url = {https://arxiv.org/abs/1907.10597},
  publisher = {arXiv},
  year = {2019}}

postylem avatar May 17 '22 14:05 postylem

Oh I may have misunderstood the request. What is the .bib entry in the ACL stylesheet?

nschneid avatar May 17 '22 14:05 nschneid

In the latex/custom.bib file in this repo it is

@article{rasooli-tetrault-2015,
    author    = {Mohammad Sadegh Rasooli and Joel R. Tetreault},
    title     = {Yara Parser: {A} Fast and Accurate Dependency Parser},
    journal   = {Computing Research Repository},
    volume    = {arXiv:1503.06733},
    year      = {2015},
    url       = {http://arxiv.org/abs/1503.06733},
    note    = {version 2}
}

which works with the current bst file to make

Mohammad Sadegh Rasooli and Joel R. Tetreault. 2015. Yara parser: A fast and accurate dependency parser. Computing Research Repository, arXiv:1503.06733. Version 2.

but if one copies the autogenerated bibtex from arXiv.org (as I imagine most people do), then you get no indication that the publisher is arXiv, or what the code is, without clicking the link.

Perhaps this can't be helped?

(I was just noticing that there were references with only a linked title, and no other information in a lot of ACL papers, so it would be nice if this could be fixed, but perhaps the only way would be if arXiv changed their format for autogenerated bibtex.)

postylem avatar May 17 '22 14:05 postylem

Thanks for pointing out the issue. In principle I don't see why the .bst shouldn't print fields for @misc entries that it currently prints only for other kinds of entries—in this case, publisher (arXiv).

Isolating the eprint number seems hard, but the full URL could be displayed for those reading on paper if there is a @misc entry with no booktitle.

In practice I don't know how easy it would be to update/regenerate our .bst.

nschneid avatar May 17 '22 14:05 nschneid

(I don't know if we should have a recommendation for where people should get bibtex, besides the Anthology itself? I use Zotero, which is about to update its handling of arXiv imports: zotero/translators#2788)

nschneid avatar May 17 '22 14:05 nschneid

The existing .bst file indeed can sense the eprint field, so it's too bad that the auto-generated entries from arXiv were changed.

davidweichiang avatar Dec 15 '23 15:12 davidweichiang