rentrez
rentrez copied to clipboard
Inconsistent behaviour and unnecessary warning when entrez_link called with single ID and by_id = TRUE
Thanks for developing rentrez, it is excellent.
The man page for entrez_link
says when by_id = TRUE
, a list of elink
objects will be returned, one for each ID in id
. This works for the example shown in the tutorial:
> all_links_sep <- entrez_link(db="protein", dbfrom="gene", id=c("93100", "223646"), by_id=TRUE)
> all_links_sep
List of 2 elink objects,each containing
$links: IDs for linked records from NCBI
> all_links_sep[[1]]$links$gene_protein
[1] "1387845369" "1387845338" "1370513171" "1370513169" "1034662000" "1034661998" "1034661996" "1034661994" "1034661992" "558472750" "545685826"
[12] "194394158" "166221824" "154936864" "122346659" "119602646" "119602645" "119602644" "119602643" "119602642" "37787309" "37787307"
[23] "37787305" "33991172" "21619615" "10834676"
> all_links_sep[[1]]$links$gene_protein_refseq
[1] "1387845369" "1387845338" "1370513171" "1370513169" "1034662000" "1034661998" "1034661996" "1034661994" "1034661992" "558472750" "194394158"
But this is what happens with only a single ID:
> one_link_sep <- entrez_link(db="protein", dbfrom="gene", id="93100", by_id=TRUE)
Warning message:
In entrez_link(db = "protein", dbfrom = "gene", id = "93100", by_id = TRUE) :
Some IDs appear to be invalid. Result containg no information for the following IDs: 93100 ,
> one_link_sep
elink object with contents:
$links: IDs for linked records from NCBI
> one_link_sep$links$gene_protein
[1] "1387845369" "1387845338" "1370513171" "1370513169" "1034662000" "1034661998" "1034661996" "1034661994" "1034661992" "558472750" "545685826"
[12] "194394158" "166221824" "154936864" "122346659" "119602646" "119602645" "119602644" "119602643" "119602642" "37787309" "37787307"
[23] "37787305" "33991172" "21619615" "10834676"
> one_link_sep$links$gene_protein_refseq
[1] "1387845369" "1387845338" "1370513171" "1370513169" "1034662000" "1034661998" "1034661996" "1034661994" "1034661992" "558472750" "194394158"
The link is returned, but as a single link, not as a list with one link. And an unnecessary warning is produced - the link's data is returned with no problems.
Please could this be returned as a list containing a single link, instead of just the single link, and the warning removed? I realise this is a slightly odd request - why use by_id with only one ID? It's because I'm running upstream queries that return different (unknown) numbers of IDs, sometimes returning only a single ID, and I want the output to always be a list so I can process it consistently. Otherwise, I need to check every return value of entrez_link to see whether it returned a single value or a list, and I need to suppress the warning, as the output is fine.
It doesn't look like the code in this repo will be updated any time soon, so you'll have to institute a workaround.
rentrez
does output entrez_link()
results with different classes if it is a single result or list. I handled this situation when extracting PubMed IDs by defining s3 methods for the two different outputs elink
(single result) and elink_list
(list of results). See https://github.com/allenbaron/DO.utils/blob/632093c8ea37ac46a18ae559a4a0ea59395edbd4/R/extract.R#L77-L188.
I created a fork of rentrez
to fix another bug (see PR #174). You're welcome to submit a pull request there. I'd be happy to merge it.