scGSVA icon indicating copy to clipboard operation
scGSVA copied to clipboard

GSVA calculation takes extremely long

Open jonrot1906 opened this issue 2 years ago • 7 comments

Dear @guokai8,

thanks for your great package. I am currently struggling a little to use it on my dataset, as the GSVA calculation takes extremely long. I am using a custom gene set in this structure:

GeneID | Annot
PTGS2 | Ferroptosis

And I am running these commands:

gene_set <- read.csv("gene_set.csv")
res<-scgsva(nft_ad,annot=gene_set,method="gsva",useTerm = F)

This produces the following console messages (which look fine in my opinion):

Setting parallel calculations through a MulticoreParam back-end
with workers=4 and tasks=100.
Estimating GSVA scores for 1 gene sets.
Estimating ECDFs with Poisson kernels
Estimating ECDFs in parallel on 4 cores

About 21 iterations (I assume cells) took around 12 hours. I am running this on a M1 Pro MacBook with 32 GB RAM - do you think it will be faster once I switch to a computer with better specifications? I want to run GSVA analysis on around 100000 cells...this would take ages.

I am keen to get your recommendations! Thanks and best regards, Jonas

jonrot1906 avatar Oct 25 '23 09:10 jonrot1906

Hi @jonrot1906 , I am working on the new version now. Will fix this issue soon. thanks! K

guokai8 avatar Nov 17 '23 16:11 guokai8

Hi @jonrot1906 , Now, I am testing two approaches: 1, use batch methods and 2, use sampling methods. I may release the new version in few days. Best, K

guokai8 avatar Nov 22 '23 19:11 guokai8

Hi @jonrot1906 , batch method is available now. And you can also calculate the UCell scores by setting the method="UCell". Now working on the sampling methods K,

guokai8 avatar Nov 28 '23 21:11 guokai8

Hi @jonrot1906 , batch method is available now. And you can also calculate the UCell scores by setting the method="UCell". Now working on the sampling methods K,

Dear @guokai8, I faced with the same problem when I calculated GSVA score with 80,000 cells * 30,000 genes. Thank you for providing the "batch method" to address this isssue, I am going to try it. But could you please explain how the "batch method" done? As I found that the GSVA will give different values depending on number of samples (https://github.com/rcastelo/GSVA/issues/101), which means if split the whole data to different parts, the result will different with the result calculating GSVA score with the whole data directly. Thank you for your help!

sjasws avatar Jul 23 '24 19:07 sjasws