streamz icon indicating copy to clipboard operation
streamz copied to clipboard

Use case: Stream arrays from redis store

Open jhamman opened this issue 8 years ago • 14 comments

We are developing a tool for streaming N-D arrays from low level languages (Fortran/C) to Python. We are using Redis and are in the process of designing an Xarray backend (https://github.com/nbren12/geostreams/issues/6) as the user facing API. We are exploring different stream handling intermediaries that will allow us do many of the common reactive programing tasks on a collection of key/value pairs mapped to Xarray objects. Ideally, we come up with a solution that works well with dask arrays too.

@nbren12 may want to expand my initial description.

References:

  1. Geostreams Issue
  2. Geostreams Wiki describing project

cc @nbren12, @phargogh, @ajijohn, @mrocklin

jhamman avatar Sep 29 '17 02:09 jhamman

It would be interesting to hear what API you would expect from a low-level streaming library that you would want to exist in order to support your project. Do existing solutions work? If not why not?

mrocklin avatar Sep 29 '17 12:09 mrocklin

Note that the other current user group of this library handles image processing pipelines. You might want to chat with people like @CJ-Wright @danielballan and @ordirules to get a sense of their experience so far.

mrocklin avatar Sep 29 '17 12:09 mrocklin

Yes, this is interesting. We'd be happy to chat.

danielballan avatar Sep 29 '17 13:09 danielballan

Yes, this is interesting. We'd be happy to chat.  Did this ever happen? If not, are people free today? I'm in use-case collecting mode for streamz, so I would find this helpful.

mrocklin avatar Oct 03 '17 14:10 mrocklin

I'm also available today.

CJ-Wright avatar Oct 03 '17 15:10 CJ-Wright

yep, should be able to make time. @mrocklin can you email us all?

jrmlhermitte avatar Oct 03 '17 15:10 jrmlhermitte

Any time constraints? I'm busy after 4:30pm Eastern US.

On Tue, Oct 3, 2017 at 11:05 AM, Julien Lhermitte [email protected] wrote:

yep, should be able to make time. @mrocklin https://github.com/mrocklin can you email us all?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mrocklin/streamz/issues/69#issuecomment-333872081, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszKjC3jp6bF-W1uI_EtwmxLbNOd8yks5sok1VgaJpZM4PoIQ3 .

mrocklin avatar Oct 03 '17 15:10 mrocklin

I too am available.

danielballan avatar Oct 03 '17 15:10 danielballan

None within your time constraints

jrmlhermitte avatar Oct 03 '17 15:10 jrmlhermitte

I have to start running for the last train by about 4:10 ET. [Otherwise no constraints.]

danielballan avatar Oct 03 '17 15:10 danielballan

I've bumped the time to 315 ET so @nbren12 and I can join. Hopefully that works for everyone.

jhamman avatar Oct 03 '17 15:10 jhamman

It was fun meeting everyone today. I hope we can continue to talk about some of these issues together. I just profiled the send/recv performance of redis. Checkout https://gist.github.com/d7283b77634f52d6c1afd62c67bfc254. I get about 1GB/s of write performance, and 500 MB/s of read performance using redis-py. I am not sure if this is the best possible, because redis-py might be doing some allocations behind the scenes.

nbren12 avatar Oct 03 '17 21:10 nbren12

I just added pyarrow. It's blazing fast as expected!

gist

nbren12 avatar Oct 03 '17 22:10 nbren12

thanks for sharing!

jrmlhermitte avatar Oct 05 '17 13:10 jrmlhermitte