mage icon indicating copy to clipboard operation
mage copied to clipboard

add support for concurrent job limits

Open kortschak opened this issue 4 years ago • 6 comments

Please take a look.

Fixes #38.

kortschak avatar Nov 08 '19 12:11 kortschak

The problem with this is that it can really easily deadlock. To run dependencies last to first, mage just fires off a bunch of goroutines and waits for them all to exit. It will do this several levels deep, so you'll often have N goroutines firing off. So if you have T1->T2->T3 and j is 1, T2 will use up a slot and then it'll deadlock when it wants to start T3 but there's no more slots.

I don't really think this is a huge problem. This is limited by how deep your dependency tree is, as defined by calling mg.Deps, and it's almost never going to be more than 3 or 4 levels deep, so, what, like maybe max 10 goroutines waiting?

natefinch avatar Nov 10 '19 03:11 natefinch

The use case that I'm interested in is when mage is used to control workflows with potentially very resource expensive parts where these are run in parallel and because of the expense can bring down the workflow if too many are run at once. Maybe this needs to be a component that is not so tightly integrated into the mage runtime?

kortschak avatar Nov 10 '19 04:11 kortschak

You can use mg.SerialDeps to run dependencies without parallelization. Maybe that is sufficient?

func SerialDeps(fns ...interface{})

SerialDeps is like Deps except it runs each dependency serially, instead of in parallel. This can be useful for resource intensive dependencies that shouldn't be run at the same time.

natefinch avatar Nov 10 '19 12:11 natefinch

No, not really. I'm look at this as a workflow mananager for bioinformatic pipelines. The workflow components are large and long lived, the machines are also large. So often you can run some parts in parallel, but the parallelisation is limited and the constraint is often system memory. The two approaches (out of the box) either risk having components brought down by the oom killer (unlimited parallelisation) or making jobs take longer and wasting computing resources (the sequential approach). It would be nice if there were a middle way.

kortschak avatar Nov 10 '19 20:11 kortschak

Can you give me a magefile that stubs out some targets and how they interact, so I have a better idea of what you need?

As an aside, it's possible the answer is just not to use mage targets directly. Clearly this is something that can be modeled in Go, so the answer may be to write it directly for your use case, and use mage to initiate it (or not... maybe mage is not the right tool).

That being said, I do think it's valuable to be able to easily control the parallelism of running tasks, so I'll try to think about how dependencies are run and if there's a way to limit them without easily getting deadlocked.

natefinch avatar Nov 12 '19 02:11 natefinch

The problem with this is that it can really easily deadlock. To run dependencies last to first, mage just fires off a bunch of goroutines and waits for them all to exit. It will do this several levels deep, so you'll often have N goroutines firing off. So if you have T1->T2->T3 and j is 1, T2 will use up a slot and then it'll deadlock when it wants to start T3 but there's no more slots.

I believe that if dependencies were resolved like a graph, as demonstrated in the article below, this would be doable. Additionally setting parallelism to 1 would essentially turn your entire workflow int a serial runner without needing to explicitly set it as such.

https://dnaeon.github.io/dependency-graph-resolution-algorithm-in-go/

ghostsquad avatar Mar 16 '20 04:03 ghostsquad