nanocompore Threads over-subscription

Threads over-subscription

Open tleonardi opened this issue 5 years ago • 7 comments

It looks like nanocompore sometimes spawns more threads than it should.. Starting it with nthreads=4 with the 7SK IVT data starts 16 threads.

Oct 11 '18 16:10 tleonardi

I have to look into the issue more carefully, not sure why/when it happens, but it happened more than once already. @a-slide do you have any idea?

Oct 11 '18 16:10 tleonardi

Ok, it looks like I figured it out. On my system numpy is built against openBlas, which by default is multithreaded. The result is that the np.array() call in __process_references() spawns multiple threads (and every worker process does the same). Since we are using multiprocessing, the solution seems to be to disable multithreading for openBlas and mkl before importing numpy:

import os
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["MKL_THREADING_LAYER"] = "sequential"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"

I'm currently testing whether it works as it should.. will commit as soon as I'm sure all is fine.

Oct 12 '18 10:10 tleonardi

I did not notice, but my version is actually also build against OpenBlas. I had a quick look as well and it looks like the method you describe should work but you might want to include that as well

os.environ['OPENBLAS_NUM_THREADS'] = '1'

Oct 12 '18 12:10 a-slide

Not completely fixed apparently. Numpy is still causing issues in a cluster environment. An option to explore might be to use this package to set the number of threads: https://github.com/joblib/threadpoolctl

May 17 '19 14:05 a-slide

This should be fixed by #94 but I haven't tested it yet. Did you?

Jun 11 '19 13:06 tleonardi

I think it's fixed

Jun 21 '19 14:06 tleonardi

But it's not, reopening.

Jul 09 '19 16:07 tleonardi

nanocompore nanocompore copied to clipboard

Threads over-subscription

nanocompore
nanocompore copied to clipboard