Accumulate without emit/output
According to the documentation of accumulate (Accumulating State), it is possible to not have accumulate "output" anything:
One piece of data comes in, either one or zero pieces go out.
I tried to come up with a way to use this to "group" values together: collect values as they come in and emit a list "every now and then".
Basically I have something along the lines of this now:
stream = streamz.Stream()
grouped = stream.accumulate(my_group, returns_state=True, start={}).filter(non_empty)
def my_group(state, row):
...
if <condition>:
return state, [<collected_rows>]
else:
return state, None
def non_empty(row):
if row:
return True
return False
But I think the .filter(non_empty) part should not be necessary. I should be able to not return (output) a value unless I emit a list in my_group, no? Would that not be the "zero pieces go out" case?
Note: I realised in the meantime that I might be able to achieve what I need using partition, but the question remains (for other use cases): how can I not emit (or output) a value from accumulate?