grpcbox
grpcbox copied to clipboard
Race condition when closing a stream via `grpcbox_client:close_and_recv/1`
I've got a bit of a conundrum trying to chase down a spurious error in a test suite. I'm occasionally getting timeout
back from grpcbox_client:close_and_recv/1
instead of the expected stream_finished
.
The code between grpcbox and chatterbox is a bit hard to follow. But near as I can tell, what's happening is that we're receiving the eos
message at [1] and then there's a race to get to [2] where most of the time is_process_alive/1
returns false although occasionally it can return true which leads to us getting a timeout.
At this point I think I'm just going to remove my assertion on the return of grpcbox_client:close_and_recv/1
because I can't figure out an appropriate patch given how we demonitor when receiving the eos
message. The only thing I can think of is to have a new grpcbox_client_stream:recv_end/2
call that ends up returning any messages in the mailbox until the process is dead or something? But that seemed not quite right so I figured I'd just open the issue to see if anyone else had any better ideas.
[1] https://github.com/tsloughter/grpcbox/blob/v0.9.1/src/grpcbox_client.erl#L197-L199 [2] https://github.com/tsloughter/grpcbox/blob/master/src/grpcbox_client_stream.erl#L117-L122
Hey, I'll dig into this soon, I've been wanting/needing to work on the client some more. I'll get back to you on this specific issue first. The client is quirky, I don't like how it is sending messages to the process that made the request, it should maybe be an optional setting if it is really desired to receive messages this way.
I think there is likely a hack fix to this that I can do before I full refactoring of the client.
Since the stream has ended it is assumed the process will be gone by the time it checks if the process is alive. I can change that recv with a 0 timeout to be either a different function or include an attribute to signal it is after the stream has ended.
Yeah, my only thought was to have a separate function there that would add a timeout to the recv since we're expecting it to be dead. I just don't know enough of the details to know how much of recv_msg would be normally be expected to possibly execute in those circumstances and what would more likely constitute an error.