pydra
pydra copied to clipboard
What features are essential for a Minimum Viable Product (MVP)
Looking through a Nipype 1 lens, please add to this issue the set of pydra features that you would like to see for a Minimum Viable Product (MVP). We would like to use this list to target any immediate development.
i would suggest we create a list and then take a vote.
please spread the word.
cc: @tclose @ghisvail @omar-rifai
I'm really liking a lot of things about Pydra but I feel like I'm still a little way off understanding all its limitations. Apart from toolkit task interface coverage, the main area for improvement I can see is the debugging experience. Not sure if this is strictly necessary for an MVP, but it kind of depends on who you are releasing it to (i.e. experienced programmers or neuroscience newbies).
Some specific issues I can think of are:
- If you incorrectly connect tasks then you get silent failures that are unintuitive to debug.
- For example, in the case that an upstream task doesn't set an output (if there is an issue with the interface and the file isn't created for example), then unless that is set as mandatory (which they aren't by default) then the error shows up as the downstream node not having an input connected. Maybe there would be a way to flag a field as being mandatory when it is connected to a downstream task.
- Type-checking doesn't seem to be enforced, which is kind of Pythonic, but given the lengths we go to specify the types it seems a bit of a waste when they aren't getting picked up by a linter either
- Provide more precise descriptive error messages in the logs at the bottom of the traceback, rather than having to scroll up through the logs to find the relevant node(s) that failed
- Would also be great to have a tool that reads the stored pickles and recapitulates the failed node (and its environment) so it can be easily debugged
Hope this is useful, I will keep a list of other potential improvements as I am going along if it would help
I second @tclose piece of feedback. The potential for a better nipype is there, but the developer experience still feels quite rough on the edges.
It sounds to me that the initial implementation efforts on pydra focused on the happy path, leaving error reporting and debugging as an exercise to the developer. And the latter is very painful. The stack traces are difficult to chew on due to the coroutine heavy runtime in particular.
The point about type safety is quite important. It often happens that a workflow where types seem to align in theory (no error on construction) blow up at runtime with an unhashable type or list index out of range. Type checking cannot be enforced anyway since this is Python, but having an API where the type system is sound and provide strong safety guaranties is definitely where modern Python is getting towards (similar to TypeScript compared to JavaScript).
I am trying to open PRs and issues for specific stuff I find as I progress in my pydra journey.
I was thinking that there could be validation function that runs just before a workflow is executed, which steps through the nodes and does some basic checks of types and connections.
Hi @satra, @djarecka, I'd agree with the points above about improving the debugging experience and I would also add a more comprehensive documentation.
-
Currently there seems to be the user guide which is very useful for getting into Pydra but quickly loses its relevance as we start coding because there are no coding examples included.
-
The notebooks are also very useful but are somewhat minimalist and not comprehensive in covering all the methods of classes or tricky use cases (I'm thinking of complex combine examples for instance, where the syntax is non-trivial and undefined if we want to combine on variables from tasks upstream).
That being said, having discovered Nipype and Pydra relatively recently for my part, I find Pydra to be much more intuitive in its syntax and the very much appreciating the features is offers. So thank you for the efforts :)