streamz icon indicating copy to clipboard operation
streamz copied to clipboard

map node inspection

Open CJ-Wright opened this issue 6 years ago • 3 comments

It might be nice to have warnings/errors raised when the function signature inside map et.al. does not match the number of incoming nodes. This won't work in all cases, since a function could take in args and kwargs and there is nothing to inspect from that, but it might work in some interesting cases.

The main benefit is that one could know before running any data into the pipeline if it was going to work. Since we know exactly how much data is going to be used in the function call we can know if it matches up and will work.

This might help with one of the main issues with streamz, writing pipelines can be more difficult than writing a naked script/notebook.

CJ-Wright avatar Jul 29 '19 17:07 CJ-Wright

I would instead suggest being very conservative, and allowing some sort of verify_inputs=False optional arg.

Might also be nice to be able to run a pipeline in "dummy" mode, where the sinks are not actually executed, we just check that data gets there, and reset() is called on any aggregation node when done.

martindurant avatar Jul 29 '19 19:07 martindurant

I guess we'd need to implement a reset system first :smile:. One could also consider this existing as a utility, which builds the graph of the pipeline and then check the inputs for each of the nodes.

A utility function (or context manager?) for the "dummy" mode might also work. Pass in a node, it builds the graph based off that node, builds a copy of the graph with various things (eg sinks) turned off and returns the node in the new graph. This way it won't impact your existing graph.

CJ-Wright avatar Jul 29 '19 20:07 CJ-Wright

All of those sound like good ideas!

martindurant avatar Jul 29 '19 20:07 martindurant