StainTools
StainTools copied to clipboard
MultiProcessing
Hello,
I want to use multiple CPUs to accelerate the function. When I use your class in multiprocessing, the kernel just freezes, and it does not do anything.
Can you help on this?
Thanks,
I'm having the same issue. @Amiiirali did you already find a way to fix this?
I'm using PyTorch for data loading and transformations. I've added the Macenko normalization to my transformation pipeline. When I set workers=0
in my dataloader, it works well. However, whenever I increase the workers
, the transform freezes when calling the transform()
function.
====== EDIT =======
I've dug deeper, and I've noticed it's breaking inside spams.py
in the lasso()
function. Something I've noticed is that this, by default, sets numThreads=-1
which in the C++ backend means it uses all available processes / thread, as mentioned in the docstring:
numThreads: (optional, number of threads for exploiting
multi-core / multi-cpus. By default, it takes the value -1,
which automatically selects all the available CPUs/cores).
This probably gives an overhead when spawning this process with multiple processes already.
When changing this to numThreads=1
it works as expected.
@Peter554 I see that you have a stale branch where you removed the SPAMS dependency and used the sklearn lasso function. Is this stable, or an unfinished functionality? In either way, is it worth to create a PR to add some arguments to some functions that in the end set numThreads=1 in the SPAMS lasso regression?
@YoniSchirris Not really. I spent two days on it, but there is no any solution. The problem is the spams library. As long as you use SPAMS in the multiprocessing process, it instantly freezes. I do not know that much, but probably it depends on the C++ code of that library.
About numThreads=-1
, what do you mean? if we set that, will it use all the processes?
Please keep me updated if you find a solution.
@Amiiirali So by itself it sets numThreads=-1
, which means it'll try to use all the CPUs available. This will clash with Pytorch, which is also managing multiple processes. If you change it to numThreads=1
, it will work.
So go to e.g. ~/miniconda3/envs/<envname>/lib/python3.7/site-packages/staintools/miscelanneous/get_contractions.py
, and change spams.lasso(X=OD.T, D=stain_matrix.T, mode=2, lambda1=regularizer, pos=True).toarray().T
to spams.lasso(X=OD.T, D=stain_matrix.T, mode=2, lambda1=regularizer, pos=True, numThreads=1).toarray().T
.
It's probably safest to make a new file in your repo that overwrites this file. Let me know if you manage to solve it with this, it works for me.