GSVA icon indicating copy to clipboard operation
GSVA copied to clipboard

ssGSEA Conceptual Questions

Open abadgerw opened this issue 1 year ago • 4 comments

@rcastelo Thanks for such a great package and support. Apologies in advance, I just had two conceptual questions about ssGSEA and how to extract some information:

  1. What does the alpha weight represent conceptually and can a metric that indicates the contribution of each protein to the overall enrichment score be extracted similar to what a PCA loading tells us?

  2. Is there a way to understand how coordinated the expression of each member of the pathway is? In other words, how co-expressed are the pathway members? Would looking at the ranks of each pathway member help to understand this (ie if they all cluster together vs not). Is this something that is easily extractable or quantifiable?

abadgerw avatar Oct 10 '24 12:10 abadgerw

@rcastelo In doing some digging, I think this query may relate to the following: https://github.com/rcastelo/GSVA/issues/69

However, given that my data has missingness, I can't use the shiny to extract the plots you've mentioned in that thread. Are there plans to be able to generate leading edge type information directly in R outside the interactive shiny?

abadgerw avatar Oct 12 '24 18:10 abadgerw

@rcastelo Hope you are doing well. I wanted to check in and see if you or your colleagues had any insight into the queries posited here? Thanks again for a fantastic tool!

abadgerw avatar Oct 21 '24 09:10 abadgerw

Hi, sorry for my not getting back to you earlier, we're working hard towards the next release of the software, which should happen Wednesday 30th, next week, and this new release version should hopefully address your requests. If you can wait a few days more, I'll get back to you once all the new functionality is in place so that you can try it and give us feedback. Thanks for your patience.

rcastelo avatar Oct 21 '24 10:10 rcastelo

No worries and sounds great! Thank you for all your efforts with making such a great package!

abadgerw avatar Oct 22 '24 10:10 abadgerw

Hi, a new version 2.0 of GSVA was released yesterday, please follow the instructions at https://bioconductor.org/install to install it, and let us know if this version addresses your feature requests.

rcastelo avatar Oct 31 '24 16:10 rcastelo

Thanks, @rcastelo! I have given it a try and had a couple questions:

  1. When trying the gsvaEnrichment function on my output of gsvaRanks, I ran into the following error: Error in if (any(mask)) { : missing value where TRUE/FALSE needed

  2. To confirm my understanding, the exprData found after the gsvaRanks function displays ranks from 1 (highest) to the maximum number of genes used as input? Conceptually, if I subset this data frame to display just the genes in my gene set, then the genes that are closer to 1 are what is driving the score for that case?

  3. When using the shiny to run the same GSVA analysis that I did in RStudio with data that has missing values, I receive the following error (not seen with the ssgsea method): Error in .gsva_score_genesets(geneSetsIdx, decOrdStat = rnkstats$dos, : REAL() can only be applied to a 'numeric', not a 'integer'

Thank you so much for your help!

abadgerw avatar Nov 04 '24 01:11 abadgerw

Hi @abadgerw,

  1. Could you post the code that causes the error?
  2. No, this is not that simple, and in principle, you should not need to access the internal slot exprData. An approach to answer the question about what genes "drive the enrichment score" is to identify the leading edge subset, which you can obtain with the function gsvaEnrichment().
  3. Could you post the version of GSVA that you are using?

rcastelo avatar Nov 05 '24 17:11 rcastelo

Thanks, @rcastelo!

The code is as follows:

gs<-read.csv("Gene Lists.csv")
gs<-as.list(gs)

data<-read.csv("Final Data.csv",header=T,row.names=1)
data<-t(data)
data<-as.matrix(data)

gsva<-gsvaParam(data, gs, use = "na.rm")
ranks<-gsvaRanks(gsva)
gsvaEnrichment(ranks)

The call to gsvaEnrichment gives the following error: Error in if (any(mask)) { : missing value where TRUE/FALSE needed

However, running gsvaScores(ranks) works fine.

I am using GSVA version 2.0.0

abadgerw avatar Nov 05 '24 17:11 abadgerw

Thanks, I've just fixed the problem with gsvaEnrichment() in both devel and release, now version 2.0.1, which will become available in the next 24/48 hrs through the Bioconductor build system by updating your Bioconductor packages via BiocManager::install(). If you don't want to wait, you can install this newer release version directly from the GitHub repo by doing:

BiocManager::install("rcastelo/GSVA", ref="RELEASE_3_20")

Let me know if this works for you.

rcastelo avatar Nov 06 '24 15:11 rcastelo

Thanks, @rcastelo! I tried it out. I am able to produce a plot using the gsvaEnrichment command. However, it only produces a plot for the first geneset even when I specify a different column or name using the geneset argument.

abadgerw avatar Nov 06 '24 17:11 abadgerw

I cannot reproduce this, for instance, if do:

library(GSVA)
example(gsvaEnrichment)
gsvaEnrichment(gsvarankspar, geneSet=1)
gsvaEnrichment(gsvarankspar, geneSet=2)
gsvaEnrichment(gsvarankspar, geneSet=3)

the three different calls to gsvaEnrichment() produce three different plots. I need to be able to reproduce the problem to fix it. Could you give more info, such as the lines of code that your are using?

rcastelo avatar Nov 06 '24 18:11 rcastelo

My fault. I see where I went wrong. I set the parameter as geneset rather than geneSet. My apologies. All works on my end!

abadgerw avatar Nov 06 '24 18:11 abadgerw