GOSemSim
GOSemSim copied to clipboard
results from function `mgoSim` based on "Wang" method
I have found that different types of hsGO
have no effect on the similarity results. R codes as below:
go1 <- c("GO:0000005","GO:0000007") # MF
go2 <- c("GO:0005385", "GO:0004553") # MF
go3 <- c("GO:0000017", "GO:0000014") # BP + MF
hsGO <- godata('org.Hs.eg.db', ont="BP", computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure="Wang",combine="BMA") # 0.473
mgoSim(go1,go3,semData=hsGO,measure="Wang",combine="BMA") # 0.016
hsGO <- godata('org.Hs.eg.db', ont="MF", computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure="Wang",combine="BMA") # 0.473
mgoSim(go1,go3,semData=hsGO,measure="Wang",combine="BMA") # 0.016
hsGO <- godata('org.Hs.eg.db', ont="CC", computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure="Wang",combine="BMA") # 0.473
mgoSim(go1,go3,semData=hsGO,measure="Wang",combine="BMA") # 0.016
Are these results reasonable?
I have been having similar issues with Wang method. When I input the same terms with different ontologies, I always get back exactly the same results.
I went through the functions in WangMethod.R
and found the following line in getSV
function:
line 58-61:
if( exists(ID, envir=.SemSimCache) ) {
sv <- get(ID, envir=.SemSimCache)
return(sv)
}
line 108-112:
if( ! exists(ID, envir=.SemSimCache) ) {
assign(ID,
sv,
envir=.SemSimCache)
}
It stores the Semantic Value of an ID into the .SemSimCache environment once you run it. The next time you want to get the Semantic Value of the same ID it will automatically retrieve it from the environment rather than run it again. The problem with this is that, if you want to retrieve the semantic value of the same ID in different ontologies, it will always give you back the one you first run it. A quick way to prevent this is that you clear the .SemSimCache environment before you run the second one with the same ID. In your case you basically can do:
remove(list = ls(envir = .SemSimCache), envir = .SemSimCache)
hsGO <- godata(‘org.Hs.eg.db’, ont=“BP”, computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0
mgoSim(go1,go3,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0
remove(list = ls(envir = .SemSimCache), envir = .SemSimCache)
hsGO <- godata(‘org.Hs.eg.db’, ont=“MF”, computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0.473
mgoSim(go1,go3,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0.016
remove(list = ls(envir = .SemSimCache), envir = .SemSimCache)
hsGO <- godata(‘org.Hs.eg.db’, ont=“CC”, computeIC=FALSE)
mgoSim(go1,go2,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0
mgoSim(go1,go3,semData=hsGO,measure=“Wang”,combine=“BMA”) # 0
I also attached the script with the modified getSV function (lightly tested). I store ID along with the ontology so it will retrieve the stored ones only if the input has both the same ID and ontology.
getSV <- function(ID, ont, rel_df, weight=NULL) {
ID_ont = paste(ID, ont, sep = “:”)
if (!exists(“.SemSimCache”)) .initial()
.SemSimCache <- get(“.SemSimCache”, envir=.GlobalEnv)
if( exists(ID_ont, envir=.SemSimCache) ) {
sv <- get(ID_ont, envir=.SemSimCache)
return(sv)
}
if (ont == “DO”) {
topNode <- “DOID:4"
} else {
topNode <- “all”
}
if (ID == topNode) {
sv <- 1
names(sv) <- topNode
return (sv)
}
if (is.null(weight)) {
weight <- c(0.8, 0.6, 0.7)
names(weight) <- c(“is_a”, “part_of”, “other”)
}
rel_df <- rel_df[rel_df$Ontology == ont,]
if (! ‘relationship’ %in% colnames(rel_df))
rel_df$relationship <- “other”
rel_df$relationship[!rel_df$relationship %in% c(“is_a”, “part_of”)] <- “other”
sv <- 1
names(sv) <- ID
allid <- ID
idx <- which(rel_df[,1] %in% ID)
while (length(idx) != 0) {
p <- rel_df[idx,]
pid <- p$parent
allid <- c(allid, pid)
sv <- c(sv, weight[p$relationship]*sv[p[,1]])
names(sv) <- allid
idx <- which(rel_df[,1] %in% pid)
}
sv <- sv[!is.na(names(sv))]
sv <- sv[!duplicated(names(sv))]
if(ont != “DO”)
sv[topNode] <- 0
if( ! exists(ID_ont, envir=.SemSimCache) ) {
assign(ID_ont,
sv,
envir=.SemSimCache)
}
return(sv)
}