pydra icon indicating copy to clipboard operation
pydra copied to clipboard

[WIP][ENH] Created BoutiquesTask for simiplified execution of Boutiques tools in Pydra

Open ValHayot opened this issue 4 years ago • 15 comments

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)

Summary

Boutiques tools can be executed in Pydra, by, for example, executing a command such as :

pydra.engine.task.BoutiquesTask("zenodo.3267250", name="fsl_bet", bosh_args=[
             "-v{0}:{0}".format(os.path.abspath(args.bids_dir)),
             "-v{0}:{0}".format(os.path.abspath(args.output_dir))],
             infile=wf.lzin.infile,
             maskfile=wf.lzin.maskfile
)

Checklist

  • [ ] All tests passing
  • [ ] I have added tests to cover my changes
  • [ ] I have updated documentation (if necessary)
  • [ ] My code follows the code style of this project (we are using black: you can pip install pre-commit, run pre-commit install in the pydra directory and black will be run automatically with each commit)

Acknowledgment

  • [X] I acknowledge that this contribution will be available under the Apache 2 license.

ValHayot avatar Mar 01 '20 18:03 ValHayot

Thanks @ValHayot for the PR. I wanted to finally start working on Boutiques :)

@satra and I were originally thinking about slightly different approach - reading boutiques spec and running the ShellCommandTask. Perhaps we could have both, but I'm curious what do you think and if you were also considering it

djarecka avatar Mar 03 '20 18:03 djarecka

Either works, but I'd imagine that there'd be a bit of redundancy with ShellCommandTasks. For instance, Boutiques already does all the output file validation, so I'm not quite sure why the user should have to provide an OutputSpec to get more than just rc, stdout and stderr when using Boutiques. However, the most straightforward way to get the output file names and their keybindings is through the Python interface, I believe.

I'm going to CC @glatard as he might have something to add.

ValHayot avatar Mar 04 '20 19:03 ValHayot

Running a Boutiques task through ShellCommand would require forking a shell to call the Boutiques interpreter that would then start another Python interpreter. I thought it would be lighter to just go through the Boutiques Python API. But of course it adds a dependency.

glatard avatar Mar 04 '20 19:03 glatard

i think this is a good starting point. some comments below.

so I'm not quite sure why the user should have to provide an OutputSpec to get more than just rc, stdout and stderr

@ValHayot - how would i select say the brain output of bet and the mask output of bet for downstream tasks. pydra is a dataflow engine so the key aspect of pydra is to be able to construct workflows.

@glatard - going through the api is fine for now. the key thing we wanted to see is if we could give people a quick way to create neuroimaging workflows using the boutiques descriptors of tools. eventually we will convert the nipype interfaces to pydra tasks, but we are going to break away the tools into their own packages. i would also make boutiques an optional add on rather than a pydra dependency.

pydra would put you in a cached working directory when the task is run. so in the above example, the second part of bosh_args would have to be filled in during the run.

satra avatar Mar 05 '20 01:03 satra

@ValHayot - I've been trying to work on you example, and I have several questions. Perhaps, two most important for the beginning:

  • is there a reason why have you used boutiques.descriptor2func instead of a command line tool?
  • i'm getting an error, that it can't find container engine. I do have docker, do I have to have something specific? Do you always have to run in a container?

Perhaps we could also chat one day if you find some time?

djarecka avatar Mar 30 '20 03:03 djarecka

Hey @djarecka , @satra , sorry for taking so long to get back to you. I wanted to fix my PR before replying so I could show you what I meant, and then got busy.

Basically, when you run a task, it produces an ExecutorOutput object, which contains the output files and error codes. Now of course, in my example, I left the OutputSpec to be "out" and just assigned a list of ExecutorOutput filenames to out (see: line 356). However, if you want to set the OutputSpec in the init to ensure that further down the line it doesn't crash because it was not defined prior to execution, parsing the output-files section of the Zenodo descriptor would enable you to do so. Although, perhaps you'd like users to explicitly define the outputs such that it's clearer to them how they should be transferring the data over to the next task?

Feel free to contact me. Should be a bit more available now :)

ValHayot avatar Apr 16 '20 05:04 ValHayot

@ValHayot - thanks for your answer! I was parsing output-files from the zenodo file, but the problem is that not all output is created, it depends on the arguments used with the command. So it would be great to know which output is expected to exist after running the task.

How can I get ExecutorOutput when I run bosh as a cli? I'm almost sure that last week I saw some files that were created by boutiques after running the task that had the list of outputs, but yesterday I couldn't find anything, and I'm not sure anymore if I saw these files, or just wanted to see ;-)

djarecka avatar Apr 16 '20 16:04 djarecka

@djarecka unfortunately the executor output object isn't saved from the CLI, but we could have than done quickly if you need it. In this way, output files wouldn't be known before they are produced though.

If you to know which files will be produced before the tool is run, evaluate can do it like this, but with limitations. For instance:

bosh evaluate zenodo.2602109 '{"in_file": "data.nii.gz", "out_file": "out"}' output-files

would return all possible output files since the tool interface doesn't include conditional outputs.

glatard avatar Apr 19 '20 22:04 glatard

@glatard - I'm thinking what should be the best way to provide output_specification...

One way would be to expect user to specify which output she/he expects and simply return error if the output is not created. The other way is to read the zenodo file and get all of the output-files (as I'm doing now), and at the end check which one are present, and raise an error only when the output that is not created is a part of a workflow connection, but this is also not perfect...

@satra - do you have opinion?

djarecka avatar Apr 21 '20 15:04 djarecka

i think there are two questions implicit in this discussion:

  1. what is the list of output keys? this would be used to allow users to connect outputs of the task to downstream tasks.
  2. what are the values of these keys after execution? this would be used after execution to set the results structure of the task and thus provide values to downstream nodes.

@glatard and @djarecka - one is at the point of task creation, the other is after execution. i think (1) would come from the spec and (2) would come from some serialization of the executor object.

satra avatar Apr 21 '20 16:04 satra

@satra - so, I was more thinking about the first issue, that's why having the list of the output after execution doesn't really help. But the zenodo spec does not give you answer to this questions, since you have all possible outputs, doesn't take into account the list of argument provided to the command. That's why I've started thinking that maybe the user should be required to specify list of outputs (that would have to be a subset of list generated from the zenodo file)

djarecka avatar Apr 21 '20 16:04 djarecka

@djarecka just to be clear: it is possible for a Boutiques spec developer to map outputs to input values, this is done through conditional outputs. It is also possible that they specify outputs to be "optional", that is to appear without a specified relation with inputs.

glatard avatar Apr 21 '20 16:04 glatard

But the zenodo spec does not give you answer to this questions

i think the zenodo spec gives you the answer to (1) just like nipype outputspec. but it does not give the answer to 2. nipype allows connecting any output independent of what the inputs are. whether that output carries a value is dependent on the execution of the interface. nipype does not impose the list_outputs condition on the workflow creator.

satra avatar Apr 21 '20 17:04 satra

@glatard - but I understand that the developer doesn't have to provide conditional outputs, at least in the fsl_bet example I don't see it

@satra - if you are saying that it's ok to connect output that is not created during the execution of the task, then I could indeed create output_spec using zenodo spec. However, will likely have to make some other changes, since right now that would raise an error if the file can't be found

djarecka avatar Apr 21 '20 17:04 djarecka

Yes you are right, conditional outputs are optional.

glatard avatar Apr 21 '20 17:04 glatard