acl-anthology paper -> proceedings -> event links in wikidata

paper -> proceedings -> event links in wikidata

Open WolfgangFahl opened this issue 3 years ago • 4 comments

For my research i try to trace from paper to events (conferences) The following SPARQL query gives some good results but the result seems to be incomplete. How could this situation be improved?

# ACL Anthology article ID 
SELECT ?article ?articleLabel ?aclId ?publishedIn ?publishedInLabel ?event ?eventLabel WHERE {
  #ACL Anthology article ID 
  ?article wdt:P7505 ?aclId.
  ?article rdfs:label ?articleLabel .
  #?aclIdStatement (ps:P7505) ?aclId.
  ?article wdt:P1433 ?publishedIn.
  ?publishedIn rdfs:label ?publishedInLabel .
  OPTIONAL {
     # is proceedings from
     ?publishedIn wdt:P4745 ?event.
     ?event rdfs:label ?eventLabel.
  }
}

try it

Feb 20 '22 10:02 WolfgangFahl

I have used the sparqlquery command line tool from https://github.com/WolfgangFahl/pyLoDStorage to show the details of the query which is name "ACL-Paper2Event" in https://github.com/WolfgangFahl/pyLoDStorage/blob/master/sampledata/scholia.yaml:

sparqlquery -qp scholia.yaml -qn "ACL-Paper2Event" -f github

ACL-Paper2Event

query

# ACL Anthology article ID 
SELECT ?article ?articleLabel ?aclId ?publishedIn ?publishedInLabel ?event ?eventLabel WHERE {
  #ACL Anthology article ID
  ?article wdt:P7505 ?aclId.
  ?article rdfs:label ?articleLabel .
  #?aclIdStatement (ps:P7505) ?aclId.
  ?article wdt:P1433 ?publishedIn.
  ?publishedIn rdfs:label ?publishedInLabel .
  #OPTIONAL {
     # is proceedings from
     ?publishedIn wdt:P4745 ?event.
     ?event rdfs:label ?eventLabel.
  #}
} LIMIT 50

result

article	articleLabel	aclId	publishedIn	publishedInLabel	event	eventLabel
Q79020060	Common Voice: A Massively-Multilingual Speech Corpus	2020.lrec-1.520	Q95997327	Proceedings of The 12th Language Resources and Evaluation Conference	Q61919909	12th Conference on Language Resources and Evaluation
Q79020060	Common Voice: A Massively-Multilingual Speech Corpus	2020.lrec-1.520	Q95997327	Proceedings of The 12th Language Resources and Evaluation Conference	Q61919909	12th Conference on Language Resources and Evaluation
Q79020060	Common Voice: A Massively-Multilingual Speech Corpus	2020.lrec-1.520	Q95997327	Proceedings of The 12th Language Resources and Evaluation Conference	Q61919909	12th Conference on Language Resources and Evaluation
Q79020060	Common Voice: A Massively-Multilingual Speech Corpus	2020.lrec-1.520	Q95997327	Proceedings of The 12th Language Resources and Evaluation Conference	Q61919909	12th Conference on Language Resources and Evaluation
Q79020060	Common Voice: A Massively-Multilingual Speech Corpus	2020.lrec-1.520	Q95997327	Proceedings of The 12th Language Resources and Evaluation Conference	Q61919909	12th Conference on Language Resources and Evaluation
Q79020060	Common Voice: A Massively-Multilingual Speech Corpus	2020.lrec-1.520	Q95997327	Proceedings of The 12th Language Resources and Evaluation Conference	Q61919909	12th Conference on Language Resources and Evaluation
Q79020060	Common Voice: A Massively-Multilingual Speech Corpus	2020.lrec-1.520	Q95997327	Proceedings of The 12th Language Resources and Evaluation Conference	Q61919909	12th Conference on Language Resources and Evaluation
Q79020060	Common Voice: A Massively-Multilingual Speech Corpus	2020.lrec-1.520	Q95997327	Proceedings of The 12th Language Resources and Evaluation Conference	Q61919909	12th Conference on Language Resources and Evaluation
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q61895831	The word analogy testing caveat	N18-2039	Q55434859	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)	Q75696024	The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Q110887400	The Power of Scale for Parameter-Efficient Prompt Tuning	2021.emnlp-main.243	Q109517629	Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing	Q109517651	The 2021 Conference on Empirical Methods in Natural Language Processing
Q110887400	The Power of Scale for Parameter-Efficient Prompt Tuning	2021.emnlp-main.243	Q109517629	Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing	Q109517651	The 2021 Conference on Empirical Methods in Natural Language Processing
Q110887400	The Power of Scale for Parameter-Efficient Prompt Tuning	2021.emnlp-main.243	Q109517629	Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing	Q109517651	The 2021 Conference on Empirical Methods in Natural Language Processing
Q110887400	The Power of Scale for Parameter-Efficient Prompt Tuning	2021.emnlp-main.243	Q109517629	Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing	Q109517651	The 2021 Conference on Empirical Methods in Natural Language Processing
Q108673464	Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia	2020.emnlp-demos.4	Q108673475	Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations	Q82290350	The 2020 Conference on Empirical Methods in Natural Language Processing
Q108673464	Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia	2020.emnlp-demos.4	Q108673475	Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations	Q82290350	The 2020 Conference on Empirical Methods in Natural Language Processing
Q108673464	Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia	2020.emnlp-demos.4	Q108673475	Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations	Q82290350	The 2020 Conference on Empirical Methods in Natural Language Processing
Q108673464	Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia	2020.emnlp-demos.4	Q108673475	Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations	Q82290350	The 2020 Conference on Empirical Methods in Natural Language Processing
Q108673464	Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia	2020.emnlp-demos.4	Q108673475	Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations	Q82290350	The 2020 Conference on Empirical Methods in Natural Language Processing
Q108673464	Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia	2020.emnlp-demos.4	Q108673475	Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations	Q82290350	The 2020 Conference on Empirical Methods in Natural Language Processing
Q108673464	Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia	2020.emnlp-demos.4	Q108673475	Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations	Q82290350	The 2020 Conference on Empirical Methods in Natural Language Processing
Q108673464	Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia	2020.emnlp-demos.4	Q108673475	Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations	Q82290350	The 2020 Conference on Empirical Methods in Natural Language Processing
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q107060118	The Danish Gigaword Corpus	2021.nodalida-main.46	Q107059887	Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021	Q102274071	The 23rd Nordic Conference on Computational Linguistics
Q105730737	DanNet2: Extending the coverage of adjectives in DanNet based on thesaurus data (project presentation)	2021.gwc-1.31	Q105730699	Proceedings of the 11th Global Wordnet Conference	Q105730832	The 11th Global WordNet Conference
Q105730737	DanNet2: Extending the coverage of adjectives in DanNet based on thesaurus data (project presentation)	2021.gwc-1.31	Q105730699	Proceedings of the 11th Global Wordnet Conference	Q105730832	The 11th Global WordNet Conference
Q105730737	DanNet2: Extending the coverage of adjectives in DanNet based on thesaurus data (project presentation)	2021.gwc-1.31	Q105730699	Proceedings of the 11th Global Wordnet Conference	Q105730832	The 11th Global WordNet Conference
Q105730737	DanNet2: Extending the coverage of adjectives in DanNet based on thesaurus data (project presentation)	2021.gwc-1.31	Q105730699	Proceedings of the 11th Global Wordnet Conference	Q105730832	The 11th Global WordNet Conference
Q107009138	Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training	2021.naacl-main.278	Q107009154	Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies	Q107009143	2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Q107009138	Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training	2021.naacl-main.278	Q107009154	Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies	Q107009143	2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Feb 20 '22 10:02 WolfgangFahl

Is there any reason you want to do this on Wikidata, instead of using the XML/YAML files we have in this repo?

(FWIW I'm not aware of any Anthology maintainer being involved in Wikidata, so I would be surprised if anyone of us could help you there.)

Feb 20 '22 15:02 mbollmann

@mbollmann thx for the swift reply. Wikidata is just a good environment especial given the scholia project. See https://scholia.toolforge.org/event-series/Q56571145 for an entry for an example event. https://www.wikidata.org/wiki/Property:P7505 states that there are potentially 50.000 articles. On the aclanthology website I found "The ACL Anthology currently hosts 74465 papers on the study of computational linguistics and natural language processing. "

Indeed i might be interested in analysing the XML/YAML files and look for conference proceedings. It looks like there has not been a bot yet transferring the entries to wikidata (the wikicite project)

Feb 20 '22 20:02 WolfgangFahl

I see. I'm not familiar with the Scholia project unfortunately; I do know Wikidata, but I am not aware of any transfer between the ACL Anthology and Wikidata, or who might have done it for the entries that already exist there.

Here's a quick example of what you can get from our Python library (in bin/):

>>> ant = Anthology("../data/")
>>> paper = ant.papers["2020.lrec-1.520"]
>>> ant.volumes[paper.parent_volume_id].get_title()
'Proceedings of the 12th Language Resources and Evaluation Conference'
>>> ant.venues.get_main_venue("2020.lrec-1.520")
'LREC'
>>> ant.venues.get_by_acronym("LREC")["name"]
'International Conference on Language Resources and Evaluation'

...where "2020.lrec-1.520" can be any ACL paper ID, of course. The information is pulled from the XML/YAML files in the data/ directory, so of course you could also use other tools to extract data from them.

Feb 20 '22 20:02 mbollmann

acl-anthology acl-anthology copied to clipboard

paper -> proceedings -> event links in wikidata

ACL-Paper2Event

query

result

acl-anthology
acl-anthology copied to clipboard