Multithreading problem in some modules/plugins when running Cellprofiler in docker container.
Describe the bug We have a problem with Cellprofiler starting to many threads when running in a Docker container.
We are running several Docker Containers with Cellprofiler on our server and want to limit them to one or a few CPU:s per container. When launching Cellprofiler it creates as many threads as there are cpu:s on the server. Our server has 64 processors 128 (logical with hyperthreads). This means cellprofiler is starting 128 threads although only one processor is available in the container (The overhead of splitting the calculations on so many threads are substantial with ca 5 times the runtime compared to as many threads as cpu:s). When running in a Docker container, it is apparently a known issue that some Python/Java API:s for questioning how many cpu:s are available are reporting the cpu:s from the underlying OS and not what is available in container. See for example: https://bugs.python.org/issue36054
We have tested "pinning" only one speciffic cpu to the docker container with "--cpus=1 --cpuset-cpus=0", and then it is working correctly with only one thread. Unfortunately pinning the container to one speciffic cpu is not an option when running an automated analysis pipeline of our images.
It looks like the threads are started by some static code in Cellprofiler when importing the python libraries/modules. I tested by halting the code with a breakpoint as first line of code in cellprofiler __main__ and the threads are showing up already before that breakpoint (At least it looks like that at least when observing threads with htop). Could you maybe point me to where in the code the threads are launched?
For us the most problematic modules are the long running MeasureColocalization-module and Cellpose-plugin.
A possible workaround could be an option to manually set the thread-count via an Environment variable or command line parameter.
To Reproduce
I don't have a ready to go test case, but If you don't know what code is causing the issue already, I can try to post a detailed setup. If you can point me to the code that is starting the threads (even before __main__ is run, then I can Investigate and help finding a workaround/fix).
Version We are running Cellprofiler 4.2.1 in headless mode from command line (Inside your Docker container: https://hub.docker.com/r/cellprofiler/cellprofiler/tags
I found the problematic code, it is the package: scipy.ndimage that is starting all the threads (used extensively in Cellprofiler). Thread numbers can be controlled with the environment variable OMP_NUM_THREADS, e.g. OMP_NUM_THREADS=1
This issue has been mentioned on Image.sc Forum. There might be relevant details there:
https://forum.image.sc/t/setting-number-of-workers-in-headless-mode/61621/4
This bit me today too - I had a container limited to 2 CPUs but running on a 96 CPU VM. Saving some very small images was taking minutes instead of a ~1s.
I confirmed setting OMP_NUM_THREADS fixed the issue for me as well - is this something CellProfiler could/should do upon startup?