patRoon icon indicating copy to clipboard operation
patRoon copied to clipboard

Memory usage for generateMSPeakLists()

Open akogler opened this issue 2 years ago • 1 comments

Hi @rhelmus,

I am experiencing a memory usage issue when running the generateMSPeakLists() command: mslists <- generateMSPeakLists(fGroupsSel, algorithm = "mzr", maxMSRtWindow = 5, precursorMzWindow = NULL, topMost = NULL, avgFeatParams = avgMSListParams, avgFGroupParams = avgMSListParams)

I am running the code on Stanford’s computing cluster via a container (as per Issue #53). To test the workflow, I am using a dataset containing 5 samples and the total size of the mzml files is about 600 MB. When I submit the job requesting 8 cores and 128000 MB of memory, it fails because it exceeds the available memory. It looks like peak memory usage reached 127058072K. I have attached the output file that the job generates, which suggests that the generateMSPeakLists() step completed but then the job crashed. I am attempting to rerun it with more memory requested but I suspect there may be some underlying issue that is causing such high memory usage for a relatively small dataset. To help me better understand whether I need to change how the job is configured, I am wondering:

  • How many CPUs should the command be run on?
  • Do you have a sense of typical runtimes and memory usage for a dataset of my size?

I have attached the R script I am running for reference. Thanks for your help! generateMSpeakLists_memory.txt generateMSpeakLists_script.txt

akogler avatar Oct 14 '22 20:10 akogler

Hi @akogler,

Just a quick reply.

At the moment generateMSPeakLists() is not yet parallelized, so for this step one core should do. While the peak lists may take some amount of memory, I usually don't see it going over one GiB or so and problems like this should not occur.

I noticed you had >400000 feature groups, that's probably not what you want. Can it be that your data is not centroided (patRoon tries to warn you if it isn't, but the detection may fail perhaps). Otherwise, you most likely need to tweak intensity thresholds in e.g. findFeatures(). I am usually aiming for max a few thousand features per analysis, and that amount is from before any filtering step has occurred.

Thanks, Rick

rickhelmus avatar Oct 14 '22 21:10 rickhelmus

Closed due to inactivity, feel free to re-open!

rickhelmus avatar Aug 08 '23 13:08 rickhelmus