PEPPAN icon indicating copy to clipboard operation
PEPPAN copied to clipboard

Out of memory error

Open SWittouck opened this issue 4 years ago • 6 comments

Dear Zhemin,

Thank you for making PEPPA publicly available and for putting the publication on bioRxiv, it's a very nice read!

I managed to install PEPPA successfully and tried to do a test run on 73 genomes of the order Lactobacillales. After a few minutes I got an out of memory error (memory was indeed full) and the job aborted. Is there anything I can do to solve this? I have 16GB of memory and was using all 16 threads I have available.

Best wishes, Stijn

SWittouck avatar Apr 10 '20 12:04 SWittouck

Due to the problem of multi-threading in Python, part of the parallel calculation is handled by multi-processes, and all data in the memory will be replicated in each process. Please try to run PEPPA with fewer processes (i.e., 4). I will close this issue for now but please re-open it if you still get an out-of-memory problem.

zheminzhou avatar Apr 10 '20 21:04 zheminzhou

Dear Zhemin,

Thank you for your suggestion, I will try this.

Best wishes, Stijn

SWittouck avatar Apr 11 '20 14:04 SWittouck

Dear Zhemin,

I tried to run with fewer threads, as you suggested, even down to a single thread. Unfortunately, the issue remained. In annex the log file with the error - it seems to occur in the BLASTn step.

Best wishes, Stijn peppa.log

SWittouck avatar Apr 12 '20 05:04 SWittouck

I have pushed PEPPA in pypi with a formal version number 1.0 The codes in this version have been re-visited to optimize the memory performance. You can install it in python3 >=3.5 via pip install bio-peppa And the executable is 'PEPPA' by default. Hope this can solve the memory leaking problem.

zheminzhou avatar Apr 22 '20 14:04 zheminzhou

Hi Zhemin,

I installed PEPPA version 1.0 using pip, as you suggested. It didn't fix the problem: I still got out-of-memory errors, no matter the number of threads I used. However, I took a closer look at how PEPPA works, and it seems to me that it is not suited for datasets above the genus level? While I have a genome dataset on the order level; I think the blastn searches are not sensitive enough for those. When I set --clust_identity to 0.5, --clust_match_prop to 0.6 and --match_identityto 0.5, there was no error anymore! So I'm still not sure what caused the error, and I think my dataset is anyway outside of the scope of PEPPA, but at least the error got solved. Thank you for your help!

I have one additional remark: I found a bug in PEPPA_parser.py. In line 64, there is a ] too many.

Best regards, Stijn

SWittouck avatar Apr 23 '20 09:04 SWittouck

Thank you for the bug report (again) and the solution you found. PEPPA allows a lower limit of "--match_identity" down to 0.4, so your value of 0.5 is fine. However, the "clust_identity" and "clust_match_prop" values are certainly out of my testing scope. I think the phylogeny based paralog splitting will still be able to handle this but am not for sure.

Will push up the fixation for the bug in PEPPA_parser.py later this week.

zheminzhou avatar Apr 24 '20 08:04 zheminzhou