webchem
webchem copied to clipboard
get_wdid() searches all of wikidata, not just chemicals
Currently get_wdid() searches more than just chemicals:
get_wdid("Horse", verbose = FALSE)
id match distance query
1 Q869595 Horse 0 Horse
This might be a problem for something that is both a chemical and something else, especially with acronyms like DDT which returns wdids for "Duffy's Tavern Airport" and "Dark Dance Treffen".
However, there is a note in the code that suggests it may be possible to narrow the search:
#! Use SPARQL to search of chemical compounds (P31)?! For a finer / better search?
SPARQL is used in wd_ident() and that's all I know about it!
related to #82
Indeed, I saw the comment about SPARQL also a while ago and started working on functions to improve the wikidata query. I am almost done and will push a PR next week.
Wonderful! I'm concurrently working on a PR to standardize input and output of all the get_() functions, and unfortunately I think that get_wdid() is one of the functions I changed the code for the most. (https://github.com/Aariq/webchem/tree/git-consistency).* Maybe take a look and see if you'd rather me go first with my PR?
*"git" was a typo in the branch name. It's supposed to bet "get-consistency".
Yes, go ahead and once your PR is merged I change the code within the function, leaving the standardized structure intact.
PR #242 is now merged
Great! I will file a PR this or next week as suggested above.
Hi @andschar how's the work for this coming along? Being a Wikidata editor, I think I could help out a bit with this one, if it's not solved yet.
I mostly wanted to chime in to say that searching by item name with "standard" SPARQL is not particularly efficient and would probably time out a lot, see this for reference.
That being said, there is a workaround which uses a mashup of SPARQL and the MediaWiki API, for example:
SELECT ?item ?itemLabel WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "www.wikidata.org";
wikibase:api "EntitySearch";
mwapi:search "pyridine";
mwapi:language "en".
?item wikibase:apiOutputItem mwapi:item.
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
}
?item wdt:P31 wd:Q11173 # Guarantees items are 'instances of' a chemical compound
}
The query above would search all item names and aliases for the string "pyridine", while also excluding results that are not "instances of" (P31) "chemical compound" (Q11173), which could help out with unwanted results.