clusterProfiler icon indicating copy to clipboard operation
clusterProfiler copied to clipboard

New feature: simplify to parent GO terms

Open TylerSagendorf opened this issue 2 years ago • 3 comments

I propose to develop a new function that uses a combination of simplify and getGOLevel to reduce the redundancy of GO terms. The main idea is that if two terms have a high similarity, the more general term will be retained. This is likely to select the parent GO terms. The code that I use for this is provided below, though it will need to be modified slightly. I also recommend saving the list of GO terms at each level and including it as an object in clusterProfiler to speed up computation on the user's end.

# For each ontology, get a list of terms at each level from 1:14
# Note: This fails to get terms from level 15 and above for some reason.
# Levels are defined here: https://david.ncifcrf.gov/content.jsp?file=FAQs.html#9

onts <- c("BP", "CC", "MF")

# Create a list of 3 lists
ont_list <- lapply(onts, function(onts) {
  # For levels 1-14, get the GO terms
  lapply(1:14, function(level) {
    clusterProfiler:::getGOLevel(ont, level)
  })
})
# Set the names to easily access elements later
names(ont_list) <- onts 

# Recommended to save ont_list

# x is an object of class gseaResult, in this case
ont <- x@setType # Use x@setType for GSEA results and x@ontology for SEA results

# For each ID, get the levels where it appears.
# Also get the first (most general) level where it appears.
x %<>%
  mutate(
    # All levels
    GO_levels = sapply(ID, function(ID) {
      # Alternatively, use min(which(...)) to just get the most general level
      which(
        unlist(lapply(ont_list[[ont]], function(s) ID %in% s))
      ))
    }),
    # Most general level (lowest value)
    first_GO_level = unlist(lapply(GO_levels, min))
  )

# If GO terms have a high similarity, select the more general one
x <- simplify(x, cutoff = 0.7, by = "first_GO_level", select_fun = min)

TylerSagendorf avatar Sep 19 '21 01:09 TylerSagendorf

@TylerSagendorf Hi maybe this is an unrelated issue to your question, but which version did you use? I am using the latest version and I found getGOLevel() function was removed from this package. I have problem converting go id to go level.

WJH58 avatar Jan 22 '22 09:01 WJH58

@TylerSagendorf Hi maybe this is an unrelated issue to your question, but which version did you use? I am using the latest version and I found getGOLevel() function was removed from this package. I have problem converting go id to go level.

I am using clusterProfiler 4.0.0, but getGOLevel is not an exported function. That is why I use the triple colon to access it. Also keep in mind that each ID may have multiple levels, depending on their relationship to other terms in the DAG.

TylerSagendorf avatar Jan 24 '22 22:01 TylerSagendorf

Ty so much for this, as it is a very useful function! There are just 2 adjustmensts I had to do:

  • There is a missing 's' in "clusterProfiler:::getGOLevel(ont, level)" -> shouldn't it be "onts"?
  • There is an extra ')' in the which( ) function

After that, it worked perfectly. Looking forward for a implementation in the package

Starahoush avatar Jul 28 '22 23:07 Starahoush