MuSiC icon indicating copy to clipboard operation
MuSiC copied to clipboard

Handle Huge Sparse Matrices

Open julvi opened this issue 3 years ago • 7 comments

Hi @xuranw,

The expression matrix of my reference scRNAseq dataset is huge (27804 genes x 118535 cells) and is readable on R as a dgCMatrix object. Unfortunately, the ExpressionSet function cannot handle the dgCMatrix-class:

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘ExpressionSet’ for signature ‘"dgCMatrix"’

And the as.matrix function cannot convert the dgCMatrix object into a normal matrix:

Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

Do you have a workaround to run MuSiC on huge expression matrices?

julvi avatar Oct 04 '21 14:10 julvi

@julvi I am encountering the exact issue, how did you solve it?

sam0per avatar Mar 14 '22 11:03 sam0per

I have also encountered this problem. Have you solved your problem? @julvi

chenx9 avatar Jul 06 '22 07:07 chenx9

I am currently facing this problem as well. the expression matrix for my scRNAseq reference is (33694 x 92385) is there any workaround to be able to create ExpressionSet object required to run MuSiC?

Jospin-Dk avatar Jul 20 '22 11:07 Jospin-Dk

Same problem here. There is the SingleCellExperiment package that handle sparse matrices but bot sure it is supported by MuSiC.

IStevant avatar Sep 07 '22 16:09 IStevant

Same problem. Any solutions?

FrankStarling avatar Jan 22 '23 22:01 FrankStarling

Hi All,

Just saw this while passing by, I deal it this way (by converting part by part and then stitiching together) Its not the most optimized piece of code. But it does the job.

## x is the large sparse matrix in DgC
## ncol break is the number of columns in each small matrices you make 
## before combining to not give an error due to large size of the original matrix

dGC_to_matrix <- function(x,ncol_break = 49999){
  
  if(length(colnames(x))>(ncol_break+1)){
    total_cols = length(colnames(x)) ## Total columns in the dgc matrix
    the_seq <- c(seq(1,total_cols,ncol_break), total_cols) ## Make a sequence starting from 
    ## 1 to the total number of columns in steps of 'ncol_breaks'
    the_seq <- unique(the_seq) ## In case the total columns == last element of the_seq, we need to avoid potnetial duplicate
  }
  matrix_list <- list() ## make an empty list to store each part matrix
  total_parts <- length(the_seq)-1 ## Number of poarts is one less than the sequence
  for(i in 1:total_parts){
    start_no = ifelse(i==1,1,the_seq[i]+1) ## Starts with 1, 
    ##but next time it should start with the column after the last column in the last part matrix created
    print(paste0(i, " is i"))
    print(paste0("start_no is", start_no))
    end_no = the_seq[i+1] 
    print(paste0("part_number:", i, ";cols-",start_no,":",end_no))
    matrix_list[[i]] <- as.matrix(x[,start_no:end_no,drop = F])
  }
  return(do.call(cbind, matrix_list)) ### cbind the columns
}

Eg:-
full_mtx <- dGC_to_matrix(full_dgc, 49999)

saeedfc avatar Jan 31 '23 15:01 saeedfc

Since MUSIC2 still uses ExpressionSet as input, and ExpressionSet does not accept sparse dgCMatrix, is there any other way to run MUSIC2 with sparse matrices?

gevro avatar Apr 21 '24 19:04 gevro