scikit-image icon indicating copy to clipboard operation
scikit-image copied to clipboard

Pipelines in scikit-image?

Open emmanuelle opened this issue 5 years ago • 5 comments

Would it be interesting to have a tool for building pipelines in scikit-image, which would allow to chain filters? scikit-learn has a pipeline tool in order to chain estimators (https://scikit-learn.org/stable/modules/compose.html)

The principal advantage would be for users to have simpler code, if for example only a few parameters of the underlying functions are exposed, with the other ones having fixed parameters. Another advantage is when using meta-functions such as our apply_parallel, you could call it on the pipelined function instead of each filter (and hence have lazy evaluation of intermediate filters I think?). Caching of intermediate results with memoization would be possible too, for faster execution when changing parameters.

The disadvantage is that we should always be wary of adding new code, and I would like to be sure whether having a pipeline mechanism solves a real problem for users, or not.

Anyway, I don't have clear thoughts about this yet but I open this issue as a placeholder for discussion.

emmanuelle avatar Jul 08 '19 07:07 emmanuelle

Do you think something like dask.delayed could fulfill a similar function? That way, a pipeline can be defined as a Python function, which should be a bit more readable. Maybe you then lose some of the features you mentioned, I'm not sure?

stefanv avatar Jul 08 '19 11:07 stefanv

Thank you @stefanv dask.delayed might be worth exploring here, yes. I'm not sure if memoization would be possible thought, I'll think about it.

emmanuelle avatar Jul 10 '19 16:07 emmanuelle

I think it does have some support for memoization, something like .persist iirc. @jakirkham might be able to chime in?

jni avatar Jul 10 '19 16:07 jni

Is there still interest in this? I am looking for composable solutions for creating ground-truth segmentations in some cellular imaging data and scikit-image is the basis for where I'm starting. Pipelines like used in sklearn were my first thought which is how i ended up here.

ergonyc avatar Jul 28 '22 17:07 ergonyc

For now, I think the best way to chain operations is to use Python.

Unless you have very well defined APIs for functions in the pipeline (e.g. sklearn classifieds), a pipeline mechanism doesn't save you that much typing. But, happy if someone could point out advantages I missed!

stefanv avatar Jul 29 '22 06:07 stefanv