grpc-java
grpc-java copied to clipboard
Behaviour change in io.grpc.stub.ClientCalls.blockingServerStreamingCall between 1.56 and 1.59
What version of gRPC-Java are you using?
Upgrading 1.56.0 to 1.59.0 in https://github.com/apache/spark/pull/43942/files#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3eR94
What is your environment?
Ubuntu Linux, scala 2.13, JDK 17.
What did you expect to see?
When calling a server side streaming call on a client blocking stub, the behaviour with 1.56 was that the call would not actually be send to the server until first .hasNext() was called.
What did you see instead?
Now with 1.59 the request seems to be sent as soon as the server side streaming call is called on the stub.
Steps to reproduce the bug
In 1.59: https://github.com/juliuszsompolski/apache-spark/commit/4cd744a075a58578bd94eee8cebeaf27aceb22ab
val iter = client.executePlan(request) // this already sends a request to the server.
In 1.56: https://github.com/juliuszsompolski/apache-spark/commit/1a9ed2eddcbb57bb00f996a35fbeef61720881d6 Edit: this actually happened only during the first call, so maybe the channel was not yet initialized yet before the first request, while in 1.59 the channel gets initialized?
val iter1 = client.executePlan(request1)
iter1.hasNext() // only this triggered the request to be sent to server in 1.56
val iter2 = client.executePlan(request2) // but this sends the request to the server.
Question: does this mean that now calling the server side streaming call can throw an error, or will the error be buffered and only thrown upon the first next() or hasNext().
~~Error handling in Apache Spark Spark Connect project depended on the assumption that the errors are only thrown from the next / hasNext of the iterator, and not from creating the iterator.~~
Edit: Self-answer: it appears that error will still only be returned upon first next / hasNext. While the request will be now eagerly sent to server, the response will not be read eagerly.
~~I am still investigating if it also had assumptions on the request being lazy (not being sent to server until first next / hasNext).
Is there a setting that can revert this behaviour change?~~
Edit (self-answer): the behavior change is fine for my use case. Before, after creating the iterator, I had to call hasNext to actually trigger the request to be send by the server; but that actually also blocked until the server responded with the first answer; I believe there was no option for just triggering the request to be send without waiting for response. Sending the request already when the iterator is created works better for my use cases.
But this issue still flags that there appears to be a behaviour change. In https://github.com/apache/spark/pull/43962 I did some testing that confirm - with grpc 1.59.0, my my server side streaming requests are sent eagerly when creating the iterator, after downgrading to grpc 1.56.0 they are not sent eagerly anymore but only when the iterator is opened. (Edit: but actually only on first request after channel creation)
https://github.com/grpc/grpc-java/pull/10496 may be related.
It looks like the new behavior is more appropriate and correct. Are you just trying to understand what is going on or is there something that you would like us to do in regard to this?
If this behaviour is considered as a more appropriate and correct:
- Python side should have the same behaviour as well. Python side does not make a request when the streaming call is open (via generator)
- The behaviour would have better be documented officially.
@larry-safran agreed that the new behavior seems more appropriate and correct. In fact, the old behavior has bitten me in the past (it actually took me some debugging in the past to understand that my server-streaming RPC were not being actually sent from the client until I opened the iterator). But as @HyukjinKwon wrote, it seems that this should be a documented and tested behavior, that should also be consistent across different implementations of GRPC.
I think the delay was caused by ThreadlessExecutor. For a brand-new channel, the first RPC wakes up the channel and starts connecting. Messages are not serialized eagerly. Once a connection is established, the RPC is assigned to that connection and the messages are serialized. However, we serialize them by having the call's executor (where callbacks run) serialize the message.
A while back, the blocking stubs were swapped to using ThreadlessExecutor, which just runs all callbacks on the blocked thread. This reduces latency and number of threads used when using blocking stubs. That approach works fine for unary, because the thread is blocked waiting on the result. But it causes trouble with the server-streaming iterator when you didn't call the iterator for periods of time. It looks like when we fixed #10490 we also fixed this, as we rightly determined the iterator was incompatible with ThreadlessExecutor.
Looks like it was just a bug. And it is now fixed. I can't speak to Python, but it sounds like a bug. I suggest creating an issue for grpc-python.
FWIW, we consider the iterator API to be pretty broken. It looks easy for a very simple case, but it is bug-prone (in application code) and gets really annoying really quickly to the point it is easier/better to just use async. We're working on replacing it.
Aside: @YifeiZhuang, looks like the blocking iterator would have caused issues with RejectedExecutionException. But Larry's change to stop using ThreadlessExecutor was already in place at #10574, so we didn't accidentally fix that bug; the bug is still lurking.
Release notes fro v1.58.0 have been updated to include this behavior change. Please feel free to file an issue with Python if you would like them to change their behavior.