rentrez
rentrez copied to clipboard
Question about entrez_link function
hey @dwinter - tried to figure this out, but you can probably do so much faster.
Question from twitter https://twitter.com/neilfws/status/461109878262493184
entrez_link(dbfrom="pccompound",db="all",id="62857")
gives no results from db "gds" But db="gds"
(entrez_link(dbfrom="pccompound",db="gds",id="62857")
) gives lots of results from gds
This appears to be happening on the NCBI's end (the xml file for the first query doesn't contain any 'gds' ids). Have just send the follwoing email to the Eutils group, will update here when I hear back
Hello,
I am the maintainer of rentrez, an R library that interfaces with th EUtils api (https://github.com/ropensci/rentrez)
Following a question from a user, I have a question about the meaning of "all" as the destination database for Elink queries. As the user points out, a search for an ID from "pccompund" to "all" doesn't turn up any links to "gds"
(http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pccompound&id=62857&db=all)
But a search on the same id, but with "gds" specified as the database against which to search uncovers many linked ids in this database
(http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pccompound&id=62857&db=gds)
Is this an expected behavior? If so, is there some way to tell which databases will be included when db is set to "all" (possibly this table? http://www.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html)
I would love to include a note about this behavior in our documentation.
Thank you in advance for your help on this, David Winter
thanks @dwinter !
Thanks guys; yes, I noticed after posting that the raw EUtils URL returns the same result, so this is unexpected behaviour of db=all at the NCBI end. Hope we hear from them soon.
Just a small update to say I haven't heard from anyone at NCBI other than to say they had recieved my email. Will report back if I hear anything.
Going to remove the "bug" label because it's no a problem with rentrez, and the thought of having an open bug for this long is annoying to me :)
FYI @neilfws
Digging this one out of the time tunnel @sckott and @neilfws.
I never head back form entrez about this, but I've just added some "higer level" functions that at least make what's going on cleared.
entrez_db_links
lists all the possible links for a given database (I guess this is what you get from "all"):
install_github("ropenscei/rentrez")
library(rentrez)
(links <- entrez_db_links("pccompound"))
#Linked dbs result with the following fields:
# [1] "pccompound_biosystems"
# [2] "pccompound_gene"
# [3] "pccompound_mesh"
# [4] "pccompound_nuccore"
...
There is a little bit of information about each one of these links, but nothing very helpful:
links$pccompound_structure
#$Name
#[1] "pccompound_structure"
#
#$Menu
#[1] "Protein Structures"
#
#$Description
#[1] "Related Protein Structure"
#
#$DbTo
#[1] "structure"
#
Still a myestery to me why you could get information by specifying a database that isn't listed as having linked information, but I guess this at least let's you know what you expect from all
?