spatialdata
spatialdata copied to clipboard
Adding `.pipe` to `SpatialData`
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
It'd be elegant to be able to chain functions on a SpatialData object. Currently given some functions f,g,h
def f(sdata: sd.SpatialData, *) -> sd.SpatialData: ...
def g(sdata: sd.SpatialData, arg1: Any, arg2: Any, *) -> sd.SpatialData: ...
def h(arg3: Any, sdata: sd.SpatialData, *) -> sd.SpatialData: ...
We would have to do the following:
sdata_h = h(arg3=c, sdata=g(f(sdata), arg1=a, arg2=b))
# or
sdata_h = h(sdata)
sdata_g = g(sdata_f, arg1=a, arg2=b)
sdata_f = f(arg3=c, sdata=sdata_h)
Describe the solution you'd like
Pandas and Xarray have pipe methods for DataFrames, DataArrays and Datasets, looking over their examples the pipe here would be able to be used like so:
sdata.pipe(f, arg1=a).pipe(g, arg2=b).pipe((h, "sdata"), arg3=c)
Describe alternatives you've considered
If a user has their own custom SpatialData Accessors for f,g,h (where h's first argument is a SpatialData / self object in this case), then it should work just the same, but incorporating the accessor call within a lambda function makes it rather wordy.
sdata = (
sdata.pipe(lambda s: s.my_accessor.f())
.pipe(lambda s: s.my_accessor.g(arg1=a, arg2=b))
.pipe(lambda s: s.my_accessor.h(arg3=c))
)
Just chaining the accessor is much easier to read in this instance.
sdata.my_accessor.f().myaccessor.g(arg1=a, arg2=b).my_accessor.h(arg3=c)
For accessors, piping would be more useful in contexts where there are higher order functions composed of calls to the accessor's methods:
def f(sdata: sd.SpatialData, arg1, arg2) -> sd.SpatialData:
intermediate_sdata = sdata.my_accessor.h(arg1).my_accessor.g(arg2)
something_has_been_done = do_something_else(intermediate_sdata)
return something_has_been_done
def i(sdata: sd.SpatialData, arg3) -> sd.SpatialData:
intermediate_sdata = sdata.my_accessor.h(arg3)
something_has_been_done2 = do_something_else2(intermediate_sdata)
return something_has_been_done2
modified_sdata = sdata.pipe(f, arg1=a, arg2=b).pipe(i, arg3=c)
Additional context
Implementation: The following has been taken from https://github.com/pydata/xarray/blob/d33e4ad9407591cc7287973b0f8da47cae396004/xarray/core/common.py#L717-L847
P = ParamSpec("P")
T = TypeVar("T")
class SpatialData:
...
def pipe(self, func: Callable[P, T] | tuple[Callable[P, T], str], *args: P.args, **kwargs: P.kwargs) -> Any:
if isinstance(func, tuple):
func, target = func
if target in kwargs:
raise ValueError(f"{target} is both the pipe target and a keyword argument")
kwargs[target] = self
return func(*args, **kwargs)
else:
return func(self, *args, **kwargs)
These pipes can return anything so users would have to keep that in mind if they plan on chaining multiple calls to pipe.
References:
Pandas DataFrame.pipe()Xarray DataArray.pipe()- For much more advanced pipeline functionality, the package
returnsprovides some interesting implementations as well with their functionsflowandpipespecifically.
hi @srivarra , this sounds a very interesting feature! what could be a use case for this at the moment?
@giovp Currently, this would be useful for some of the pipelines I've created with SpatialData objects in my analysis project. Multiple functions are called sequentially on the same object. It's not a super important feature or use case, it's more of a convenience utility.
Sounds very interesting @srivarra , I personally don't have a lot of capacity for new features atm, but if you feel like submitting a PR, would be very happy to support!
@srivarra Would you be willing to help implement this? If so we could schedule a meeting and work on it together if you would like.
@melonora Yeah I'd be willing to implement this, was on vacation for a while, but I'll add a PR soonish.
Thanks @srivarra, the feature looks very interesting! Happy to review the PR 😊