ducttape
ducttape copied to clipboard
Smarter traversal with -j flag
ducttape-0.3 defaults to depth-first traversal of the realization graph in order to try different kind of tasks quickly (and fail fast). But when the user elects to run multiple processes, this ordering for certain workflows will actually prevent parallelism. For example, if the realization graph is like
a.1 --> b.1 -+-> c --> d
a.2 --> b.2 /
(tasks a
and b
having two realizations each, both of which are prerequisites to c
), then the depth-first strategy will yield a.1 b.1 a.2 b.2 c d
, where the only opportunity for parallelism is b.1
and a.2
. Because c
depends on two branches, this would actually fail more slowly than the breadth-first strategy of a.1 a.2 b.1 b.2 c d
, because the a.1/a.2
and b.1/b.2
might both run in parallel given -j2
.
I suspect a smarter strategy might be a hybrid depth/breadth traversal, where the breadth is constrained by the number of things allowed to execute in parallel. Thus -j2
would be equivalent to breadth-first for the structure above, while -j3
would cause the traversal to start with up to 3 ready tasks, and so forth.
Or, can the traversal order be computed on the fly? Then after starting a.1
with -j2
it would realize that b.1
is not ready and therefore start a.2
in parallel.
A related problem is that if a realization of a task fails, it is easy to miss it when other things are running in parallel. Maybe there could be an option to have ducttape immediately suspend other running or scheduled realizations of a task if one of them fails?
I'm skeptical about immediately stopping. More often than not, I don't want to kill a tuning run that's been going for 12 hours on 32 cores just because I made a simple typo in a results formatting script for some barely-related realization.
So far I haven't seen enough painful situations come out of this. I generally just open up the number of tasks to execute in parallel to something huge and let a job scheduler figure it out.
If you're interested in spending some cycles on this, let me know. The traversal order is implemented as an agenda sort via the Ordering[D] parameter passed to https://github.com/jhclark/ducttape/blob/master/src/main/scala/ducttape/hyperdag/walker/UnpackedPhantomMetaDagWalker.scala.
Here's the primary implementation of traversals (it's simple!): https://github.com/jhclark/ducttape/blob/master/src/main/scala/ducttape/hyperdag/walker/Traversal.scala
You can see it getting sloshed around here: https://github.com/jhclark/ducttape/blob/master/src/main/scala/ducttape/cli/ExecuteMode.scala.