pydra icon indicating copy to clipboard operation
pydra copied to clipboard

more asynchronous pydra - different from how it's currently implemented

Open satra opened this issue 4 years ago • 0 comments

yesterday i ran a workflow that made me think about how pydra could handle it

  • (shell) use dcm2niix to convert groups of dicoms
  • (singularity) run kwyk using singularity + docker on a gpu (here i optimized to run multiple files through the same GPU process, but this could be parallelized if the configuration allowed it)
  • (python) use nibabel to do some computation on the 3 images that resulted from the first two steps.
  • (python) use pandas to do some aggregation

i realized as i was doing this that i was doing this in parts. the kwyk process was running in the background in a single process generating outputs. while i was getting summaries of the next two steps iteratively on however many subjects had run through. so the main process had not concluded, but i was effectively emitting info that i could use in downstream tasks, which were themselves caching and updating as new triggers came in.

i suspect as we deal with really large datasets, some sort of asynchronous execution with message passing would be nice to have.

perhaps this out of scope, but i wanted to at least consider the possibility and what that would mean architecturally.

satra avatar Jun 25 '21 14:06 satra