Enhancement: Optimize the memory usage for calculating gene score (RP) for scATAC-seq with a large number of cells
I have to process a scATAC-seq dataset with over 48,000 cells after merging three conditions and 2 replicates each condition. I kept 200,000 top peaks. The scATAC_cellranger_count.py can finish successfully, however, even our high memory node (260G mem) of our HPC keeps killing scATAC_genescore.py. We need to optimize the memory usage of this script. This issue ticket will track the progress of the optimization.
Thanks Tao! We have updated the code to improve the memory efficiency of MAESTRO, please update let us know if you still encountering the memory issues. Also, we are currently working on supporting multiple samples. I will let you know once we finished.
How is the memory usage for now? I have not tested it for such a big number of cells.