clusterProfiler
clusterProfiler copied to clipboard
New feature: simplify to parent GO terms
I propose to develop a new function that uses a combination of simplify
and getGOLevel
to reduce the redundancy of GO terms. The main idea is that if two terms have a high similarity, the more general term will be retained. This is likely to select the parent GO terms. The code that I use for this is provided below, though it will need to be modified slightly. I also recommend saving the list of GO terms at each level and including it as an object in clusterProfiler to speed up computation on the user's end.
# For each ontology, get a list of terms at each level from 1:14
# Note: This fails to get terms from level 15 and above for some reason.
# Levels are defined here: https://david.ncifcrf.gov/content.jsp?file=FAQs.html#9
onts <- c("BP", "CC", "MF")
# Create a list of 3 lists
ont_list <- lapply(onts, function(ont) {
# For levels 1-14, get the GO terms
lapply(1:14, function(level) {
clusterProfiler:::getGOLevel(ont, level)
})
})
# Set the names to easily access elements later
names(ont_list) <- onts
# Recommended to save ont_list
# x is an object of class gseaResult, in this case
ont <- x@setType # Use x@setType for GSEA results and x@ontology for SEA results
# For each ID, get the levels where it appears.
# Also get the first (most general) level where it appears.
x %<>%
mutate(
# All levels
GO_levels = sapply(ID, function(ID) {
# Alternatively, use min(which(...)) to just get the most general level
which(
unlist(lapply(ont_list[[ont]], function(s) ID %in% s))
)
}),
# Most general level (lowest value)
first_GO_level = sapply(GO_levels, min)
)
# If GO terms have a high similarity, select the more general one
x <- simplify(x, cutoff = 0.7, by = "first_GO_level", select_fun = min)
@TylerSagendorf Hi maybe this is an unrelated issue to your question, but which version did you use? I am using the latest version and I found getGOLevel() function was removed from this package. I have problem converting go id to go level.
@TylerSagendorf Hi maybe this is an unrelated issue to your question, but which version did you use? I am using the latest version and I found getGOLevel() function was removed from this package. I have problem converting go id to go level.
I am using clusterProfiler 4.0.0, but getGOLevel
is not an exported function. That is why I use the triple colon to access it. Also keep in mind that each ID may have multiple levels, depending on their relationship to other terms in the DAG.
Ty so much for this, as it is a very useful function! There are just 2 adjustmensts I had to do:
- There is a missing 's' in "clusterProfiler:::getGOLevel(ont, level)" -> shouldn't it be "onts"?
- There is an extra ')' in the which( ) function
After that, it worked perfectly. Looking forward for a implementation in the package