acl-anthology
acl-anthology copied to clipboard
Wrong bibliography entries for journal frontmatter
Hi @mjpost,
Thanks for finalizing the ingestion of the TAL journal volumes #2431
You're right there remains a minor problem when generating the frontmatter BibTeX entry (@proceedings type),
so I create this new issue
Issue description
Currently, the name of the journal appears twice:
cf. e.g. https://aclanthology.org/2011.tal-1.0/
Éric Villemonte de La Clergerie, Béatrice Daille, Yves Lepage, and François Yvon. 2011. Traitement Automatique des Langues, Volume 52, Numéro 1 : Varia [Varia]. Traitement Automatique des Langues, 52(1).
What's the expected result?
Instead, could it look like:
Éric Villemonte de La Clergerie, Béatrice Daille, Yves Lepage, and François Yvon, editors. 2011. Traitement Automatique des Langues, Volume 52, Numéro 1 : Varia [Varia]. ATALA (Association pour le Traitement Automatique des Langues), France.
To get this, could you modify the way you create the bibtex entry :
- by removing the
journaland thenumberattributes (in accordance with the spec of@proceedings) - and also removing the
volumeattribute (even though this is in contradiction with the spec) - be careful to concatenate the number to the bibkey otherwise the same key will be produced for all 3 issues of a year-volume
Namely produce a bib file like: @proceedings{tal-2011-1, title = {Traitement Automatique des Langues, Volume 52, Numéro 1 : Varia [Varia]}, editor = {Villemonte de la Clergerie, Éric and Daille, Béatrice and Lepage, Yves and Yvon, François}, publisher = {ATALA (Association pour le Traitement Automatique des Langues)}, year = "2011", address = "France", url = {https://aclanthology.org/2011.tal-1.0}, }
Thank you
Thanks, I believe this affects all journals that have frontmatter (e.g. https://aclanthology.org/2020.ijclclp-1.0/), so we should probably fix the way we generate bibliography entries for journal frontmatter in general.
Before I look into this, I'd like to clarify #2685 though, i.e. whether this should even be frontmatter in the first place.
This seems clearly to be a mistake where the full volume was ingested as frontmatter. @anthology-assist can you fix this? The steps should be:
- Move the
<url>tag from the<frontmatter>to the<meta>block - Remove the
<frontmatter>block - Rename the PDF (remove
.0from the file name)
@mjpost Note that you wanted to explicitly add the frontmatter always, cf. #2412, so probably there's more to this?
Hi @mbollmann @mjpost , Let me recall that #2412 was created precisely because of TAL ingestion: For TAL, until 2017 volumes don't come with a global pdf (no 0.pdf file) whereas there is one for volumes from 2017 and later
Marie
@mcandito Thanks, but the issue here is that full-volume PDF should be 20xx.tal-1.pdf and linked under the volume page (e.g. https://aclanthology.org/volumes/2017.tal-1/), not 20xx.tal-1.0.pdf, which is reserved for frontmatter and should only have e.g. the first 12 pages of https://aclanthology.org/2017.tal-1.0.pdf, i.e. table of contents, preface etc., but no actual journal articles.
Thanks @mbollmann
-
I understand that the 20xx.tal-1.0.pdf files are longer than what they should be I think this was a workaround because in the first tries of tal ingestion, there was no entry for full volumes
-
note that in the zips I currently provide, the pdf for full volumes are present twice : both as 20xx.tal-y.0.pdf and as 20xx.tal-y.pdf
-
But it seems to me the bug cannot come from a pdf (the 0.pdf) being too long
-
maybe the problem comes from the fact that in the bib file of the "BibTeX file containing entries for all papers" ( namely 20xx / data / tal-20xx-y / proceedings / cdrom / tal-20xx-y.bib ) I have included the entries for each paper, plus an entry for the full volume
=> shall I remove this ??
@mcandito
- But it seems to me the bug cannot come from a pdf (the 0.pdf) being too long
No, that bug comes from our side, probably because we didn't pay enough attention to frontmatter before, particularly for journals, as most of the volumes are conference proceedings. (Actually, even citation strings for conf. proceedings frontmatter looks a bit off on the website...)
The question whether there should even be a <frontmatter> block for TAL or what it should contain/link to is orthogonal to that. That discussion probably belongs in #2685 or #2439, and we should first make sure that we (from the Anthology's side) are on the same page before deciding if you should change anything in the TAL data.