DGE_workshop_salmon_online
DGE_workshop_salmon_online copied to clipboard
Orgdb note didn't work in Gene annotation
This is a draft of how to fix it: #check available updated database query(ah,'org.Hs.eg.db.sqlite') human_orgdb <- query(ah, c("Homo sapiens", "OrgDb")) test <- human_orgdb[["AH111575"]] test
query(ah,'org.Hs.eg.db.sqlite') is key. different versions of bioc impact this
Unfortunately while this works, the following line from the original code, which uses select()
will not work on it without specifying the package select() is from:
human_orgdb <- human_orgdb[["AH111575"]]
annotations_orgdb <- select(human_orgdb, res_tableOE_tb$gene, c("SYMBOL", "GENENAME", "ENTREZID"), "ENSEMBL")
Error in UseMethod("select") :
no applicable method for 'select' applied to an object of class "c('OrgDb', 'AnnotationDb', 'envRefClass', '.environment', 'refClass', 'environment', 'refObject', 'AssayData')"
This, however, will work (with a warning):
human_orgdb <- human_orgdb[["AH111575"]]
annotations_orgdb <- AnnotationDbi::select(human_orgdb, res_tableOE_tb$gene, c("SYMBOL", "GENENAME", "ENTREZID"), "ENSEMBL")
'select()' returned 1:many mapping between keys and columns
Which also implies that the note itself is wrong, since the note mentions these 1:many mappings are automatically removed.
And in fact there are duplicate ENTREZ IDs
> annotations_orgdb<-annotations_orgdb[!is.na(annotations_orgdb$ENTREZID),]
> dim(annotations_orgdb)
[1] 36282 4
> sum(is.na(annotations_orgdb$ENTREZID))
[1] 0
> sum(duplicated(annotations_orgdb$ENTREZID))
[1] 272
Unless there is a different select()
that produces a working result
Fixed the issue with multiple mapping:
https://github.com/hbctraining/Intro-to-DGE/commit/cf20c19ecafb80f892dfa48b5d1b464f4d742027