streamz icon indicating copy to clipboard operation
streamz copied to clipboard

Accumulate without emit/output

Open scherand opened this issue 4 years ago • 0 comments

According to the documentation of accumulate (Accumulating State), it is possible to not have accumulate "output" anything:

One piece of data comes in, either one or zero pieces go out.

I tried to come up with a way to use this to "group" values together: collect values as they come in and emit a list "every now and then".

Basically I have something along the lines of this now:

stream = streamz.Stream()
grouped = stream.accumulate(my_group, returns_state=True, start={}).filter(non_empty)


def my_group(state, row):
    ...
    if <condition>:
        return state, [<collected_rows>]
    else:
        return state, None

def non_empty(row):
    if row:
        return True
    return False

But I think the .filter(non_empty) part should not be necessary. I should be able to not return (output) a value unless I emit a list in my_group, no? Would that not be the "zero pieces go out" case?

Note: I realised in the meantime that I might be able to achieve what I need using partition, but the question remains (for other use cases): how can I not emit (or output) a value from accumulate?

scherand avatar May 09 '21 21:05 scherand