OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[META] Streaming Indexing API

Open reta opened this issue 2 years ago • 5 comments

Is your feature request related to a problem? Please describe. The meta issue to track the Streaming Indexing API progress

Describe the solution you'd like As outcome of the https://github.com/opensearch-project/OpenSearch/issues/5001 & https://github.com/opensearch-project/OpenSearch/pull/7273, we have outlined the way such steaming support could be integrated into OpenSearch.

  • [ ] https://github.com/opensearch-project/OpenSearch/issues/9067
  • [x] https://github.com/opensearch-project/documentation-website/issues/5540
  • [ ] https://github.com/opensearch-project/OpenSearch/issues/9068
  • [ ] https://github.com/opensearch-project/OpenSearch/issues/9069
  • [ ] https://github.com/opensearch-project/OpenSearch/issues/9071
  • [ ] https://github.com/opensearch-project/OpenSearch/issues/9070
  • [ ] https://github.com/opensearch-project/OpenSearch/issues/9072
  • [ ] https://github.com/opensearch-project/OpenSearch/issues/9075
  • [ ] https://github.com/opensearch-project/OpenSearch/issues/15447

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

reta avatar Aug 02 '23 14:08 reta

@reta pretty great stuff - there's other work on improving client/server performance, how do you see us fit work on Protobuf (@VachaShah) / gRPC?

dblock avatar Aug 03 '23 16:08 dblock

@reta pretty great stuff - there's other work on improving client/server performance, how do you see us fit work on Protobuf (@VachaShah) / gRPC?

Thanks @dblock, the gRPC would definitely benefit from the reactive streaming part. From other side, I assume the gRPC would be used as node transport layer (at least, initially), so the HTTP reactive layer (as suggested alternative transport for HTTP clients) would benefit enormously from that - we will be having end-to-end reactive processing pipeline.

reta avatar Aug 03 '23 20:08 reta

@reta Looking to collaborate on the Streaming API changes and see by when it can make it to OpenSearch release.

shwetathareja avatar Oct 16 '23 03:10 shwetathareja

@reta Looking to collaborate on the Streaming API changes and see by when it can make it to OpenSearch release.

@shwetathareja that would be great, the first thing is to get this one in https://github.com/opensearch-project/OpenSearch/pull/9672 - the pull request adds new HTTP transport based on Reactor Netty 4 with streaming support, it is well on schedule for 2.12. What is left there is testing part, since this transport is not default (and experimental), needs some ad-hoc testing. I should be able to wrap it up this week (the two back to back releases derailed the plans a bit).

Once the transport is there, we could split the work, there are quite a few opportunities for doing that in parallel, thank you.

reta avatar Oct 16 '23 13:10 reta

@reta sounds good. I will also go through https://github.com/opensearch-project/OpenSearch/pull/9672 to get better understanding. Lets connect next week around how we can split the remaining work. Looking forward to working together. Thank you!

shwetathareja avatar Oct 18 '23 02:10 shwetathareja

Hi, I'm really interested in testing this feature.

Does the current implementation support bi-directional streaming, I.e. returning responses for each chunk/document?

Currently I'm streaming the request, but OpenSearch appears to wait until the request is complete before sending the response. Not sure whether my setup is wrong, or if this is expected.

Thanks

Edit: this works as expected. My code was the issue!

T-J-L avatar Sep 19 '24 09:09 T-J-L

Does the current implementation support bi-directional streaming, I.e. returning responses for each chunk/document?

Just for visibility, yes the implementation support bi-directional streaming, thanks @T-J-L !

reta avatar Sep 19 '24 11:09 reta

I didn't want to clutter this issue so created a separate one here, but any help would be appreciated 😄

thomas-long-f3 avatar Sep 24 '24 15:09 thomas-long-f3

Once the transport is there, we could split the work, there are quite a few opportunities for doing that in parallel, thank you.

@reta , any plan to build transport support ? Better to release in 3.0 if this will be some breaking change .

ylwu-amzn avatar Jan 21 '25 22:01 ylwu-amzn

@reta , any plan to build transport support ?

Thanks @ylwu-amzn , there are efforts to bring alternative transports (like https://github.com/opensearch-project/OpenSearch/pull/16962), but AFAIK no one is working on making the native transport streaming capable at the moment.

Better to release in 3.0 if this will be some breaking change

Thanks @ylwu-amzn , we were able to deliver the streaming HTTP transport as non-breaking change, I expect this to be the case for node-to-node as well.

reta avatar Jan 21 '25 23:01 reta