grpc-rs
grpc-rs copied to clipboard
Client Stream of a streaming server can't be Boxed
I've put a repro at https://github.com/illicitonion/grpc-rs-repro-0 which uses the proto from https://github.com/googleapis/googleapis/blob/master/google/bytestream/bytestream.proto
The call to call_sync, which doesn't move the stream out of the function, reliably works.
The call to call_future, which Boxes the stream, either gives me:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "RpcFailure(RpcStatus { status: Unknown, details: Some(\"Failed to create subchannel\") })"', src/libcore/result.rs:860:4
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
1: std::panicking::default_hook::{{closure}}
2: std::panicking::default_hook
3: std::panicking::rust_panic_with_hook
4: std::panicking::begin_panic_new
5: std::panicking::begin_panic_fmt
6: rust_begin_unwind
7: core::panicking::panic_fmt
8: core::result::unwrap_failed
9: <core::result::Result<T, E>>::unwrap
10: grpcrepro::main
11: __rust_maybe_catch_panic
12: std::rt::lang_start
13: main
or
E1201 22:37:55.116403000 140735833875264 call.c:1731] assertion failed: grpc_cq_begin_op(call->cq, notify_tag)
Abort trap: 6
I believe both should be safe, and it's possible that the Stream being produced by the client isn't properly owning all of the state it needs.
That's because channel get dropped early. But it should return error instead of core.
If the Call
output by Channel::create_call
relies on the underlying Channel
not being dropped, shouldn't Call
either take ownership of the underlying Channel
(or an Arc
pointing to it), or keep a reference to it (forcing you to manage the lifetime of Channel
properly)?
Thanks for the suggestion. But channel may become invalid even a reference is kept, which can just ensure it's not dropped by the wrapper. And actually there are two kinds of calls here, client call and server call. For client call, we can get the channel easily, but for server call there is no API to obtain the underlying channel yet. So I think it's better to just return an error to state what's happening.
Besides, when env
is dropped, all the completion queues are shutdown too, hence keeping a reference to channel is still useless.
We may need to explain the whole mechanism in our documentations. /cc #114
And I believe the coredump should be fixed by #125.
Besides, when env is dropped, all the completion queues are shutdown too, hence keeping a reference to channel is still useless.
Then it sounds like the clone of the env
(or at least an Arc
reference to it) should be held in the Stream
implementation?
While it's not a memory safety issue, it's definitely a deviation from the implicit contract of Stream
s, and represents something that should probably be fixed rather than documented.
But channel may become invalid even a reference is kept, which can just ensure it's not dropped by the wrapper. For client call, we can get the channel easily, but for server call there is no API to obtain the underlying channel yet.
This can, on the server side, happen in a wide variety of contexts. However, on the server-side, it is unlikely that the server/environment/whatever gets dropped in the middle of a call, so covering the client side would cover the large majority of cases where this is a nuisance.
Currently, simple calls such as :
let answers_client = answers_grpc::AnswersClient::open_with_defaults();
let answers_stream = answers_client
.as_publish_opt(
&AnswersRequest::default(),
call_config(),
)
.expect("Could not get answers");
succeed, while these:
let answers_stream = answers_grpc::AnswersClient::open_with_defaults()
.as_publish_opt(
&AnswersRequest::default(),
call_config(),
)
.expect("Could not get answers");
don't, and since no lifetimes are involved, this is a terrible trap.
I also feel like Call should hold the Arc
on the ChannelInner
when built through Channel::create_call()
.
Another alternative that avoids the issue with the Call
being different on the server side and on the client side would be to hold the Arc
in the ClientXXXReceiver
s.
This PR implements the latter by holding the Channel
.
Holding Channel
inside requests can lead to resource leak. For example, a long run combinator of futures can prevent Receiver
or Sender
from being dropped, which cause the channel not being released.
The provided example is not a trap if the contract is fully understood. Not all lifetime dependency should be constrained by lifetime. For example, when using a thread pool to spawn a task, no one thinks the task can still succeed after dropping the pool. It's considered a valid case that the future should be resolved as failure.
let f = tokio_threadpool::ThreadPool::new().spawn_handle(lazy(|| Ok::<_, ()>(42)));
f.wait().expect("could not finished.");
To be honest I'm not sure I would necessarily know there's a problem if I see that tokio example somewhere. ^^'
And that is even though the need to call expect("could not finished.")
exists for the single reason that futures may end up in a state where the future will never be executed, which undeniably helps in suspecting that behaviour.
In our case, it's pretty clear that the call can fail if there's a networking issue, is cancelled by the server, etc... which already justifies the presence of grpcio::Error
s on the stream, so the presence of this error does not force you to think of "hey, you need to ensure by yourself that you don't drop the channel" like the wait()
returning a Result
does in tokio.
Also, In the case of tokio
, I could imagine a scenario where one would think "I'm done with this whole futures set, let's close its thread pool". I can't think of a similar scenario in a case where you're in the middle of a bunch of gRPC calls and you would want to kill the channel, and even less so where having something like channel.close()
wouldn't solve it.
Of course "if the contract is fully understood", well, you know. The point is that we've been several people not understanding this contract at first sight, so maybe we could make it easier to understand - be it by changing it or not.
Finally, I don't understand the resource-leak case you are talking about in the first paragraph. If the combinator of futures isn't meant to ever resolve, why is it spawned/not dropped in the first place ? Can you give a little more detailed example?
Users can close the channel for quickly shutdown or whatever suitable. Documentation needs to explain the mechanism clearly as I suggested.