acl-anthology icon indicating copy to clipboard operation
acl-anthology copied to clipboard

Wrong bibliography entries for journal frontmatter

Open mcandito opened this issue 2 years ago • 7 comments
trafficstars

Hi @mjpost,

Thanks for finalizing the ingestion of the TAL journal volumes #2431

You're right there remains a minor problem when generating the frontmatter BibTeX entry (@proceedings type), so I create this new issue

Issue description

Currently, the name of the journal appears twice:

cf. e.g. https://aclanthology.org/2011.tal-1.0/

Éric Villemonte de La Clergerie, Béatrice Daille, Yves Lepage, and François Yvon. 2011. Traitement Automatique des Langues, Volume 52, Numéro 1 : Varia [Varia]. Traitement Automatique des Langues, 52(1).

What's the expected result?

Instead, could it look like:

Éric Villemonte de La Clergerie, Béatrice Daille, Yves Lepage, and François Yvon, editors. 2011. Traitement Automatique des Langues, Volume 52, Numéro 1 : Varia [Varia]. ATALA (Association pour le Traitement Automatique des Langues), France.

To get this, could you modify the way you create the bibtex entry :

  • by removing the journal and the number attributes (in accordance with the spec of @proceedings )
  • and also removing the volume attribute (even though this is in contradiction with the spec)
  • be careful to concatenate the number to the bibkey otherwise the same key will be produced for all 3 issues of a year-volume

Namely produce a bib file like: @proceedings{tal-2011-1, title = {Traitement Automatique des Langues, Volume 52, Numéro 1 : Varia [Varia]}, editor = {Villemonte de la Clergerie, Éric and Daille, Béatrice and Lepage, Yves and Yvon, François}, publisher = {ATALA (Association pour le Traitement Automatique des Langues)}, year = "2011", address = "France", url = {https://aclanthology.org/2011.tal-1.0}, }

Thank you

mcandito avatar Apr 11 '23 16:04 mcandito

Thanks, I believe this affects all journals that have frontmatter (e.g. https://aclanthology.org/2020.ijclclp-1.0/), so we should probably fix the way we generate bibliography entries for journal frontmatter in general.

Before I look into this, I'd like to clarify #2685 though, i.e. whether this should even be frontmatter in the first place.

mbollmann avatar Jul 26 '23 11:07 mbollmann

This seems clearly to be a mistake where the full volume was ingested as frontmatter. @anthology-assist can you fix this? The steps should be:

  • Move the <url> tag from the <frontmatter> to the <meta> block
  • Remove the <frontmatter> block
  • Rename the PDF (remove .0 from the file name)

mjpost avatar Jul 26 '23 12:07 mjpost

@mjpost Note that you wanted to explicitly add the frontmatter always, cf. #2412, so probably there's more to this?

mbollmann avatar Jul 26 '23 13:07 mbollmann

Hi @mbollmann @mjpost , Let me recall that #2412 was created precisely because of TAL ingestion: For TAL, until 2017 volumes don't come with a global pdf (no 0.pdf file) whereas there is one for volumes from 2017 and later

Marie

mcandito avatar Jul 26 '23 13:07 mcandito

@mcandito Thanks, but the issue here is that full-volume PDF should be 20xx.tal-1.pdf and linked under the volume page (e.g. https://aclanthology.org/volumes/2017.tal-1/), not 20xx.tal-1.0.pdf, which is reserved for frontmatter and should only have e.g. the first 12 pages of https://aclanthology.org/2017.tal-1.0.pdf, i.e. table of contents, preface etc., but no actual journal articles.

mbollmann avatar Jul 26 '23 13:07 mbollmann

Thanks @mbollmann

  • I understand that the 20xx.tal-1.0.pdf files are longer than what they should be I think this was a workaround because in the first tries of tal ingestion, there was no entry for full volumes

  • note that in the zips I currently provide, the pdf for full volumes are present twice : both as 20xx.tal-y.0.pdf and as 20xx.tal-y.pdf

  • But it seems to me the bug cannot come from a pdf (the 0.pdf) being too long

  • maybe the problem comes from the fact that in the bib file of the "BibTeX file containing entries for all papers" ( namely 20xx / data / tal-20xx-y / proceedings / cdrom / tal-20xx-y.bib ) I have included the entries for each paper, plus an entry for the full volume

=> shall I remove this ??

mcandito avatar Jul 26 '23 14:07 mcandito

@mcandito

  • But it seems to me the bug cannot come from a pdf (the 0.pdf) being too long

No, that bug comes from our side, probably because we didn't pay enough attention to frontmatter before, particularly for journals, as most of the volumes are conference proceedings. (Actually, even citation strings for conf. proceedings frontmatter looks a bit off on the website...)

The question whether there should even be a <frontmatter> block for TAL or what it should contain/link to is orthogonal to that. That discussion probably belongs in #2685 or #2439, and we should first make sure that we (from the Anthology's side) are on the same page before deciding if you should change anything in the TAL data.

mbollmann avatar Jul 26 '23 14:07 mbollmann