tidybulk
tidybulk copied to clipboard
test_gene_overrepresentation: visualising results
Do you have any recommendations for visualising the output of test_gene_overrepresentation?
Can it/should it use a clusterprofiler viz method: https://yulab-smu.github.io/clusterProfiler-book/ or a ggplot2 one?
Good question
in the past I have done something like this, for upregulated and downregulated, but is not so great
But on the website there is better examples, I'm wondering if I'm erasing key information that allows to build such plots.
Just to follow up here, don't think this issue is a high priority (for workshop anyway) as may not have time for any pathway analyses, but if needed we could maybe have this below as a suggestion for how to visualise results. It's not using test_gene_overrepresentation
it's using clusterProfiler itself with tidybulk test_differential_abundance
output and tidyverse style and can get all the clusterprofiler plots.
library(clusterProfiler)
library(org.Hs.eg.db)
# extract all genes tested for DE
res <- counts_de_pretty %>%
pivot_transcript() %>%
filter(!lowly_abundant)
# GO terms
egoCC <- res %>%
filter(FDR < 0.1 & logFC > 0 ) %>%
pull( "transcript" ) %>%
enrichGO(
OrgDb = org.Hs.eg.db,
keyType = 'SYMBOL',
ont = "BP",
universe = (res %>% pull( "transcript" ) ) )
dotplot(egoCC)
goplot(egoCC)
emapplot(egoCC)
# MSigDB Hallmark
gmtH <- read.gmt( "https://data.broadinstitute.org/gsea-msigdb/msigdb/release/6.2/h.all.v6.2.symbols.gmt" )
enrH <- enricher(
gene = ( res %>% filter(FDR < 0.1 & logFC > 0) %>%
pull( "transcript" ) ),
TERM2GENE = gmtH,
universe = ( res %>% pull( "transcript" ) ) )
dotplot( enrH )
emapplot(enrH)
I'm lost with this issue? Is it still relevant?
Well I think we should have a tidybulk pathway/gene set analysis section at some point for a workshop.
For the moment I just put some info in the supplementary here https://stemangiola.github.io/biocasia2020_tidytranscriptomics/articles/supplementary.html#how-to-perform-gene-enrichment-analysis-1
But it doesn't use the tidybulk pathway analysis, it just uses tidybulk de results and then clusterprofiler viz:
dotplot(egoCC)
goplot(egoCC)
emapplot(egoCC)
Not sure whether better to use clusterprofiler for the viz or try to visualise tidybulk pathway results?
tidybulk can be used for calculation and attr(..., "") can be used to extract raw results and plotting them. Now sure if it's too messy. OK let's try to keep thinking about this
Is this still relevant? @mblue9 any interest in doing a blog post on pathway analyses with tidybulk? so there would be a real application for me to get this improved.
For me at the moment it's not a very high priority but I'd be happy to write a blog post if you want to focus on improving this aspect. Or we can wait til we have more time to work on it.
Just noting here I have a tiny bit on tidybulk pathway analysis here which we could build on using that dataset or airway or another https://mblue9.github.io/RNAseq-R-tidyverse/articles/tidytranscriptomics.html#gene-set-testing-1