streamz icon indicating copy to clipboard operation
streamz copied to clipboard

Filter DaskStream

Open CJ-Wright opened this issue 6 years ago • 7 comments

Would it be possible to have a filter DaskStream? I'm happy to put in a PR but I don't know where to start.

CJ-Wright avatar Aug 08 '18 21:08 CJ-Wright

Does it mean that we filter out futures or that we produce a stream of futures that might be null? It seems ambiguous to me.

On Wed, Aug 8, 2018 at 5:06 PM Christopher J. Wright < [email protected]> wrote:

Would it be possible to have a filter DaskStream? I'm happy to put in a PR but I don't know where to start.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mrocklin/streamz/issues/195, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszEXHFBozePUn9kreZ47CJCuyj3xqks5uO1LcgaJpZM4V0qeT .

mrocklin avatar Aug 08 '18 21:08 mrocklin

I don't know if we can filter out futures without interrogating them (and thus needing to be greedy). I was thinking about a stream which contained null/no-op values.

CJ-Wright avatar Aug 08 '18 21:08 CJ-Wright

Should that be called filter or are there other things (like producing a reduced stream of futures after interrogating them) that would have as much of a right to that name? If it's ambiguous then I'm not sure how best to proceed.

On Wed, Aug 8, 2018 at 5:10 PM Christopher J. Wright < [email protected]> wrote:

I don't know if we can filter out futures without interrogating them (and thus needing to be greedy). I was thinking about a stream which contained null/no-op values.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mrocklin/streamz/issues/195#issuecomment-411553492, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszFekYQIwHrqnGcveIS45Dx-fjxoLks5uO1PPgaJpZM4V0qeT .

mrocklin avatar Aug 08 '18 21:08 mrocklin

We could call it something else, I'm not too attached to the name. I'm mostly interested in functionality similar to what we have in the core streamz module, where values filtered out are not computed against for downstream nodes. I don't think we can take the approach of core streamz, since we'd need to read the result of the future to decide if it should be emitted or not. But maybe a no-op would have a similar effect, where the data is emitted but produces no output.

CJ-Wright avatar Aug 08 '18 21:08 CJ-Wright

I wonder if this is just map, but with custom user code. You would have to make decisions about how to represent N/A data and such.

On Wed, Aug 8, 2018 at 5:14 PM Christopher J. Wright < [email protected]> wrote:

We could call it something else, I'm not too attached to the name. I'm mostly interested in functionality similar to what we have in the core streamz module, where values filtered out are not computed against for downstream nodes. I don't think we can take the approach of core streamz, since we'd need to read the result of the future to decide if it should be emitted or not. But maybe a no-op would have a similar effect, where the data is emitted but produces no output.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mrocklin/streamz/issues/195#issuecomment-411554691, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPOwXxV4LNjwy9MuLw8QaWo5GSNPks5uO1TBgaJpZM4V0qeT .

mrocklin avatar Aug 08 '18 21:08 mrocklin

Maybe, although it seems like an undue burden on the user code to handle DaskStream specific filter logic. Maybe there is a middle layer between the two.

CJ-Wright avatar Aug 08 '18 21:08 CJ-Wright

My experience is that when faced with ambiguity one should resist the urge to choose and defer to the user, keeping core scope small. This is only when making infrastructural libraries though, for concrete applications it's a lot easier to be opinionated. I totally get where you're coming from.

On Wed, Aug 8, 2018 at 5:26 PM Christopher J. Wright < [email protected]> wrote:

Maybe, although it seems like an undue burden on the user code to handle DaskStream specific filter logic. Maybe there is a middle layer between the two.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mrocklin/streamz/issues/195#issuecomment-411557857, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszLKaqJfBG9pAbL0oK3a2Pvgy34Kyks5uO1d0gaJpZM4V0qeT .

mrocklin avatar Aug 08 '18 21:08 mrocklin