miniwdl
miniwdl copied to clipboard
Deadlock encountered when trying to run task-intensive workflows.
Hi,
I am currently testing the somatic pipeline with 35 samples and that manages to consistently deadlock miniwdl in someplace. It is very hard to debug. With pyflame I managed to see that while spinning the process is on this line: https://github.com/chanzuckerberg/miniwdl/blob/main/WDL/runtime/workflow.py#L906 However, no tasks are started. I see no subprocesses running.
I can't really figure out where it goes wrong I am working on compiling a minimal test dataset that reproduces the issue.
Instructions: deadlock-workflow.tar.gz
wget https://github.com/chanzuckerberg/miniwdl/files/9349257/deadlock-workflow.tar.gz
mkdir test-workflow
cd test-workflow
tar -xvzf ../deadlock-workflow.tar.gz
MINIWDL__FILE_IO__ALLOW_ANY_INPUT=true miniwdl run -i tests/integration/Somatic_PairedEnd.json somatic.wdl
This will reliably deadlock. It is not backend dependent. I have had the same issue using singularity and docker swarm. EDIT: The files in this test are mostly the same. But that is not the cause of the issue. The issue was discovered working on 35 samples that were different.
As a workaround it will probably help to increase the setting of [scheduler] subworkflow_concurrency. I can easily believe there's a bug in the logic that is supposed to prevent a deadlock when that setting is a smaller value, though...will look into it
Idea: lazily initialize a distinct thread pool each time we see a new subworkflow nesting level (len(run_id_stack)
]
MINIWDL__SCHEDULER__SUBWORKFLOW_CONCURRENCY=8 MINIWDL__FILE_IO__ALLOW_ANY_INPUT=true miniwdl run -i tests/integration/Somatic_PairedEnd.json somatic.wdl
-> Deadlocks (8 is my core count.)
MINIWDL__SCHEDULER__SUBWORKFLOW_CONCURRENCY=40 MINIWDL__FILE_IO__ALLOW_ANY_INPUT=true miniwdl run -i tests/integration/Somatic_PairedEnd.json somatic.wdl
-> The workflow continues.
So your guess was indeed correct.
I think I got the reason: I have multiple levels of subworkflows. The sampleworklow starts a QC workflow. The reason for this organisation is that a QC workflow is used in almost all pipelines.
So what happens: there are 35 samples. Right off the bat 8 sample workflows are started. So far so good, but then after that, the QC workflows can't be started. Because the thread pool is full. But the sample workflows cannot continue because they depend on the QC workflow. Deadlock.
To test this theory I run with a subworkflow_concurrency of 34. One less than the number of samples. And indeed that locks too.
Nested subworkflows are a thing. I think the easiest solution is that only one nested subworkflow can run per subworkflow, and that it inherits the threadpool from the calling subworkflow. Though that will run into scalability issues where the workflow model is base 10 units -> 100 subtasks -> 1000 subsubtasks -> 10000 subsubsubtasks. Another is to allocate the subworkflow threads depth-first instead of breadth-first. EDIT: Yet another is too allow UNLIMITED threads. Why not? Let the scheduler handle the tasks. That seems the easiest and most robust solution.
I did some further research and the number of threads python can handle is indeed limited. Using the script mentioned here: https://stackoverflow.com/questions/48970857/maximum-limit-on-number-of-threads-in-python my PC reports a maximum of 30,000 ish threads. So brute forcing the number of threads is not a robust solution that will always save the day. I think th depth-first search will work much better. Even a 10-level nested workflow will always finish if there are at least 10 threads available. Regardless of the number of invocations.
@rhpvorderman Yes, all basically correct. There are some heuristics right now that are supposed to have the effect of prioritize the more deeply-nested subworkflow calls, but it's hacky (and evidently not always effective) because Python's built-in ThreadPoolExecutor doesn't really support a priority queue. I have an idea to initialize a new thread pool for each nesting level, which should still keep the total thread count reasonably under control. Alternatively we might need to look around on PyPI for a fancier, permissively-licensed thread pool that supports a true priority queue.
I have an idea to initialize a new thread pool for each nesting level, which should still keep the total thread count reasonably under control.
That seems to be a proper solution. It seems to me that this design which has nesting_levels * cpu cores
number of threads is reasonable and that it is not possible anymore to run into deadlocks. At least you have a proper test case now. I will try to trim it down a bit with miniwdl zip and provide it here again.
EDIT: minimal_deadlock.tar.gz
Fixed in #593