pydra
pydra copied to clipboard
Add support for arbitrary mappings in a FunctionTask
Types of changes
- New feature (non-breaking change which adds functionality)
Summary
This PR adds support for returning total or partial mappings of the output spec by the wrapped function, as an alternative to going through a conversion from the returned tuple to a dictionary via output_names. Unlike the named tuple case where the shape of the data is known a priori (I have got exactly 2 output variables which I want named to foo and bar), this PR adds support for functions which may return more, less or exactly these data as an arbitrary mapping.
This could be particularly useful for dynamic task construction use cases, like BIDS readers, parsers and writers.
Checklist
- [ ] I have added tests to cover my changes (if necessary)
- [ ] I have updated documentation (if necessary)
For a concrete example, it would simplify this piece of terrible code very much:
https://github.com/aramis-lab/pydra-bids/blob/main/pydra/tasks/bids/utils/read_bids.py
A safer alternative could also be to only return if the mapping returned by the inner function contains all keys declared in output_names. Right now, I went for the coalesce to None option, since this is what we are going for in case the function did not return anything. I am happy with either.
a side note: for dynamic tasks as being targeted here, the cache checksum/location will only depend on inputs (e.g. the query string/template), not the underlying database. hence either the source directory would need to be provided and cached as a separate input(a potentially expensive operation), or another default input should be associated that uses a random number generator as a default. the current forcing option of self.rerun_task may not suffice in all cases, especially if the same task gets called multiple times.
Good point. How does caching work today with nipype for things like BIDSDataGrabber and DataSink? Would it work differently with pydra fundamentally?
How does caching work today with nipype for things like BIDSDataGrabber and DataSink?
caching is local to a workflow and nipype1 does not have the concept of reusing the same task in multiple places or even across workflows. this is only available in pydra. in nipype1, simply setting the rerun flag would force that interface to rerun, but because there is no shared rerun capabilities, this particular issue does not come into play. so dynamic interfaces are simply set to rerun by default in nipype1.
Codecov Report
Base: 81.23% // Head: 38.26% // Decreases project coverage by -42.96% :warning:
Coverage data is based on head (
0f8869b) compared to base (a9a116d). Patch coverage: 0.00% of modified lines in pull request are covered.
:exclamation: Current head 0f8869b differs from pull request most recent head b89f9a2. Consider uploading reports for the commit b89f9a2 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #565 +/- ##
===========================================
- Coverage 81.23% 38.26% -42.97%
===========================================
Files 20 20
Lines 4391 4393 +2
Branches 1262 1202 -60
===========================================
- Hits 3567 1681 -1886
- Misses 820 2416 +1596
- Partials 4 296 +292
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 38.24% <0.00%> (-43.00%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| pydra/engine/task.py | 47.09% <0.00%> (-46.45%) |
:arrow_down: |
| pydra/tasks/__init__.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
| pydra/mark/functions.py | 17.64% <0.00%> (-82.36%) |
:arrow_down: |
| pydra/engine/audit.py | 27.50% <0.00%> (-70.00%) |
:arrow_down: |
| pydra/engine/state.py | 30.86% <0.00%> (-65.54%) |
:arrow_down: |
| pydra/engine/graph.py | 31.89% <0.00%> (-62.80%) |
:arrow_down: |
| pydra/engine/helpers_state.py | 35.08% <0.00%> (-60.53%) |
:arrow_down: |
| pydra/utils/messenger.py | 47.94% <0.00%> (-47.95%) |
:arrow_down: |
| pydra/engine/helpers.py | 39.00% <0.00%> (-47.76%) |
:arrow_down: |
| pydra/engine/specs.py | 48.21% <0.00%> (-46.56%) |
:arrow_down: |
| ... and 9 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@ghisvail - can't access the link with the example anymore.
perhaps you can also add some test, e.g. pydra/mark/tests/test_functions.pyand/orpydra/engine/tests/test_task.py`
Yes, I am planning to add some unit tests covering this particular usage.
I'll ping you once done :+1
Yes, I am planning to add some unit tests covering this particular usage.
I'll ping you once done :+1
I added a test case which covers present, absent and undefined mappings from returned dict to annotation.
A natural extension to this PR would be to handle functions annotated with a TypedDict return value.
@ghisvail - can't access the link with the example anymore.
I did some refactoring of the code in the meantime. This is what it looks like today:
https://github.com/aramis-lab/pydra-bids/blob/main/pydra/tasks/bids/utils.py
Granted it's likely a terrible way of making dynamic tasks for reasons I have not thought of, but I sense there could be a nice pattern emerging from this.
@djarecka do you think we could get this one in for the release too?
I was also thinking that extending this feature to TypedDict may help reduce typing redundancy by doing:
class Point(TypedDict):
x: int
y: int
@task
def random_point() -> Point:
return Point(x=random.randrange(16), y=random.randrange(16))
And have the resulting task automatically expose x and y as integer-typed output.
Maybe?
if you think that you could use it, then sure
Going forward, I think it would be good to deprecate {'name': type} for TypedDict, which didn't exist when we initially came up with the return annotations.
Going forward, I think it would be good to deprecate
{'name': type}forTypedDict, which didn't exist when we initially came up with the return annotations.
We might not need to deprecate it, since TypedDict can be constructed dynamically. So we could normalize tasks internals to use TypedDict but keep annotate as a user-friendly wrapper to help generate the TypedDict definition from the mapping passed in annotate.
What do you think of this plan?
Sounds good to me.
Ok, then I'll find some time to start hacking on it 👍
@ghisvail - you want to still work in this PR, or you want me to merge this one and open a new one?
@djarecka please go ahead with merging. I'll open a follow-up PR for TypedDict.