Proseg v2 out-of-memory error
Hello,
I was running Proseg v2.0.1 on my Xenium 5k data. For one sample, I encountered out-of-memory error. I've given 257GB memory for the process already, and rerun with 300GB memory still gave the same error.
The data size can be seen from the following Proseg on-screen info:
Using 32 threads
Read 710843563 transcripts
1161757 cells
5101 genes
Estimated full area: 144677150
Full volume: 5230983000
Using grid size 138.91269. Chunks: 11786
I notice that you have a new release of Proseg recently. I was thinking of trying that version, but from the release notes I only see the improvement on runtime. So I would like to hear your suggestions. Thanks!
Sincerely, Yiming
Hi Yiming,
Unfortunately 2.0.2 only introduced run time improvements, memory usage will be similar to 2.0.1. I'm working on some more significant changes to reduce the memory usage, but that won't be out for a while.
For now, I'd recommend trying with --nbglayers 1, which should have a pretty minimal effect on accuracy but reduce memory usage by 60GB or so on your data.
Thank you @dcjones for your suggestion! I'll definitely try it.
Hello,
I am wondering if @yihming tried --nbglayers 1 and whether they found the results comparable. I have successfully tested Proseg on one of my samples and it improved the segmentation quite a bit, but when trying to apply it to the rest of my samples (Xenium 5k), I am running out of memory despite working with 2TB. Im running Proseg within a singularity image. Is reducing --nbglayers still the best solution for this? For reference, Ive tried running using 32, 24, 16 cores with the same result.
Using 8 threads
Read 142046955 transcripts
273544 cells
5006 genes
Estimated full area: 25948670
Full volume: 1904196500
Using grid size 143.91837. Chunks: 2793
Thanks
Probably --nbglayers 1 is easiest way to reduce memory usage with the least impact. Number of threads will have very little impact on memory.
I'm a little surprised you are running out of memory on the dataset you posted the output from. 273k cells is not small, but not enormous either. I just tried a 5k dataset with 192k cells on my computer and it used less than 20GB memory. I'd make sure you're not running into memory usage limits on your system (the ulimit command should tell you if there such user limits).
In the slightly longer term, I'm working on a major iteration that should pretty dramatically reduce the memory usage, especially on 5k and other high-plex data.