MotifScan icon indicating copy to clipboard operation
MotifScan copied to clipboard

Memory consumption issue

Open Mr-Milk opened this issue 4 years ago • 7 comments

I tried to scan motif on a genome region with hg38 build with -t 18 corresponded to my CPU number

but it raised:

OSError: [Errno 12] Cannot allocate memory

And then I tried with -t 8, the program ate up to around 50G of my RAM. I ran it on WSL2 ubuntu 20.04 TLS.

image

Mr-Milk avatar Dec 08 '20 04:12 Mr-Milk

Sorry, but how many regions were scanned?

hongduosun avatar Dec 08 '20 05:12 hongduosun

More than 200K

Mr-Milk avatar Dec 08 '20 05:12 Mr-Milk

I'm afraid this is a temporary limit for MotifScan because only small parts of codes are refactored using C to speed up calculating motif scores. So every single motif score is stored and passing back to Python and this requires O(n_region * length_per_region * n_motif) memories. I'll improve this in the next update.

hongduosun avatar Dec 08 '20 06:12 hongduosun

Thanks for your answer. Just a little suggestion, I looked at your code, the parallelism is using python's multiprocessing which might be the reason for such huge memory consumption. Since it will basically copy the whole process of the current python process. It might help if you could try to implement the parallelism from C-side.

Mr-Milk avatar Dec 08 '20 09:12 Mr-Milk

Thanks a lot for your advice!

hongduosun avatar Dec 08 '20 15:12 hongduosun

This has been fixed in v1.3.0 after using pthread in the C extension. Thanks again!

hongduosun avatar Jan 21 '21 14:01 hongduosun

I tried it with the same dataset, at some point, the programme still ate up all of my RAM and caused a system exit 😥, but it's sure better than before 🥰. Is it possible to free some unused memory, save the results to the disk during the process?

Mr-Milk avatar Jan 22 '21 07:01 Mr-Milk