acl-anthology
acl-anthology copied to clipboard
paper -> proceedings -> event links in wikidata
For my research i try to trace from paper to events (conferences) The following SPARQL query gives some good results but the result seems to be incomplete. How could this situation be improved?
# ACL Anthology article ID
SELECT ?article ?articleLabel ?aclId ?publishedIn ?publishedInLabel ?event ?eventLabel WHERE {
#ACL Anthology article ID
?article wdt:P7505 ?aclId.
?article rdfs:label ?articleLabel .
#?aclIdStatement (ps:P7505) ?aclId.
?article wdt:P1433 ?publishedIn.
?publishedIn rdfs:label ?publishedInLabel .
OPTIONAL {
# is proceedings from
?publishedIn wdt:P4745 ?event.
?event rdfs:label ?eventLabel.
}
}
I have used the sparqlquery command line tool from https://github.com/WolfgangFahl/pyLoDStorage to show the details of the query which is name "ACL-Paper2Event" in https://github.com/WolfgangFahl/pyLoDStorage/blob/master/sampledata/scholia.yaml:
sparqlquery -qp scholia.yaml -qn "ACL-Paper2Event" -f github
ACL-Paper2Event
query
# ACL Anthology article ID
SELECT ?article ?articleLabel ?aclId ?publishedIn ?publishedInLabel ?event ?eventLabel WHERE {
#ACL Anthology article ID
?article wdt:P7505 ?aclId.
?article rdfs:label ?articleLabel .
#?aclIdStatement (ps:P7505) ?aclId.
?article wdt:P1433 ?publishedIn.
?publishedIn rdfs:label ?publishedInLabel .
#OPTIONAL {
# is proceedings from
?publishedIn wdt:P4745 ?event.
?event rdfs:label ?eventLabel.
#}
} LIMIT 50
result
| article | articleLabel | aclId | publishedIn | publishedInLabel | event | eventLabel |
|---|---|---|---|---|---|---|
| Q79020060 | Common Voice: A Massively-Multilingual Speech Corpus | 2020.lrec-1.520 | Q95997327 | Proceedings of The 12th Language Resources and Evaluation Conference | Q61919909 | 12th Conference on Language Resources and Evaluation |
| Q79020060 | Common Voice: A Massively-Multilingual Speech Corpus | 2020.lrec-1.520 | Q95997327 | Proceedings of The 12th Language Resources and Evaluation Conference | Q61919909 | 12th Conference on Language Resources and Evaluation |
| Q79020060 | Common Voice: A Massively-Multilingual Speech Corpus | 2020.lrec-1.520 | Q95997327 | Proceedings of The 12th Language Resources and Evaluation Conference | Q61919909 | 12th Conference on Language Resources and Evaluation |
| Q79020060 | Common Voice: A Massively-Multilingual Speech Corpus | 2020.lrec-1.520 | Q95997327 | Proceedings of The 12th Language Resources and Evaluation Conference | Q61919909 | 12th Conference on Language Resources and Evaluation |
| Q79020060 | Common Voice: A Massively-Multilingual Speech Corpus | 2020.lrec-1.520 | Q95997327 | Proceedings of The 12th Language Resources and Evaluation Conference | Q61919909 | 12th Conference on Language Resources and Evaluation |
| Q79020060 | Common Voice: A Massively-Multilingual Speech Corpus | 2020.lrec-1.520 | Q95997327 | Proceedings of The 12th Language Resources and Evaluation Conference | Q61919909 | 12th Conference on Language Resources and Evaluation |
| Q79020060 | Common Voice: A Massively-Multilingual Speech Corpus | 2020.lrec-1.520 | Q95997327 | Proceedings of The 12th Language Resources and Evaluation Conference | Q61919909 | 12th Conference on Language Resources and Evaluation |
| Q79020060 | Common Voice: A Massively-Multilingual Speech Corpus | 2020.lrec-1.520 | Q95997327 | Proceedings of The 12th Language Resources and Evaluation Conference | Q61919909 | 12th Conference on Language Resources and Evaluation |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q61895831 | The word analogy testing caveat | N18-2039 | Q55434859 | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | Q75696024 | The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
| Q110887400 | The Power of Scale for Parameter-Efficient Prompt Tuning | 2021.emnlp-main.243 | Q109517629 | Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing | Q109517651 | The 2021 Conference on Empirical Methods in Natural Language Processing |
| Q110887400 | The Power of Scale for Parameter-Efficient Prompt Tuning | 2021.emnlp-main.243 | Q109517629 | Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing | Q109517651 | The 2021 Conference on Empirical Methods in Natural Language Processing |
| Q110887400 | The Power of Scale for Parameter-Efficient Prompt Tuning | 2021.emnlp-main.243 | Q109517629 | Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing | Q109517651 | The 2021 Conference on Empirical Methods in Natural Language Processing |
| Q110887400 | The Power of Scale for Parameter-Efficient Prompt Tuning | 2021.emnlp-main.243 | Q109517629 | Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing | Q109517651 | The 2021 Conference on Empirical Methods in Natural Language Processing |
| Q108673464 | Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia | 2020.emnlp-demos.4 | Q108673475 | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations | Q82290350 | The 2020 Conference on Empirical Methods in Natural Language Processing |
| Q108673464 | Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia | 2020.emnlp-demos.4 | Q108673475 | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations | Q82290350 | The 2020 Conference on Empirical Methods in Natural Language Processing |
| Q108673464 | Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia | 2020.emnlp-demos.4 | Q108673475 | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations | Q82290350 | The 2020 Conference on Empirical Methods in Natural Language Processing |
| Q108673464 | Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia | 2020.emnlp-demos.4 | Q108673475 | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations | Q82290350 | The 2020 Conference on Empirical Methods in Natural Language Processing |
| Q108673464 | Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia | 2020.emnlp-demos.4 | Q108673475 | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations | Q82290350 | The 2020 Conference on Empirical Methods in Natural Language Processing |
| Q108673464 | Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia | 2020.emnlp-demos.4 | Q108673475 | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations | Q82290350 | The 2020 Conference on Empirical Methods in Natural Language Processing |
| Q108673464 | Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia | 2020.emnlp-demos.4 | Q108673475 | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations | Q82290350 | The 2020 Conference on Empirical Methods in Natural Language Processing |
| Q108673464 | Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia | 2020.emnlp-demos.4 | Q108673475 | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations | Q82290350 | The 2020 Conference on Empirical Methods in Natural Language Processing |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q107060118 | The Danish Gigaword Corpus | 2021.nodalida-main.46 | Q107059887 | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021 | Q102274071 | The 23rd Nordic Conference on Computational Linguistics |
| Q105730737 | DanNet2: Extending the coverage of adjectives in DanNet based on thesaurus data (project presentation) | 2021.gwc-1.31 | Q105730699 | Proceedings of the 11th Global Wordnet Conference | Q105730832 | The 11th Global WordNet Conference |
| Q105730737 | DanNet2: Extending the coverage of adjectives in DanNet based on thesaurus data (project presentation) | 2021.gwc-1.31 | Q105730699 | Proceedings of the 11th Global Wordnet Conference | Q105730832 | The 11th Global WordNet Conference |
| Q105730737 | DanNet2: Extending the coverage of adjectives in DanNet based on thesaurus data (project presentation) | 2021.gwc-1.31 | Q105730699 | Proceedings of the 11th Global Wordnet Conference | Q105730832 | The 11th Global WordNet Conference |
| Q105730737 | DanNet2: Extending the coverage of adjectives in DanNet based on thesaurus data (project presentation) | 2021.gwc-1.31 | Q105730699 | Proceedings of the 11th Global Wordnet Conference | Q105730832 | The 11th Global WordNet Conference |
| Q107009138 | Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training | 2021.naacl-main.278 | Q107009154 | Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies | Q107009143 | 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics |
| Q107009138 | Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training | 2021.naacl-main.278 | Q107009154 | Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies | Q107009143 | 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics |
Is there any reason you want to do this on Wikidata, instead of using the XML/YAML files we have in this repo?
(FWIW I'm not aware of any Anthology maintainer being involved in Wikidata, so I would be surprised if anyone of us could help you there.)
@mbollmann thx for the swift reply. Wikidata is just a good environment especial given the scholia project. See https://scholia.toolforge.org/event-series/Q56571145 for an entry for an example event. https://www.wikidata.org/wiki/Property:P7505 states that there are potentially 50.000 articles. On the aclanthology website I found "The ACL Anthology currently hosts 74465 papers on the study of computational linguistics and natural language processing. "
Indeed i might be interested in analysing the XML/YAML files and look for conference proceedings. It looks like there has not been a bot yet transferring the entries to wikidata (the wikicite project)
I see. I'm not familiar with the Scholia project unfortunately; I do know Wikidata, but I am not aware of any transfer between the ACL Anthology and Wikidata, or who might have done it for the entries that already exist there.
Here's a quick example of what you can get from our Python library (in bin/):
>>> ant = Anthology("../data/")
>>> paper = ant.papers["2020.lrec-1.520"]
>>> ant.volumes[paper.parent_volume_id].get_title()
'Proceedings of the 12th Language Resources and Evaluation Conference'
>>> ant.venues.get_main_venue("2020.lrec-1.520")
'LREC'
>>> ant.venues.get_by_acronym("LREC")["name"]
'International Conference on Language Resources and Evaluation'
...where "2020.lrec-1.520" can be any ACL paper ID, of course. The information is pulled from the XML/YAML files in the data/ directory, so of course you could also use other tools to extract data from them.