grpc
grpc copied to clipboard
[Ruby] deflake-ruby-e2e-tests
fix #25662 and #25423
This happens when RpcServer is garbage collected before stopping, in the free function in the extension, it shuts down grpc without waiting for the thread pools that are processing the server calls to finish.
When stress running e2e tests, ClientController.new may throw exception, as a result, neither the servers started in ClientController.initialize or in ServerRunner were properly stopped, for such cases, grpc server and completion queue were shut down by ruby GC while there was an ongoing server call, this caused the assertion in completion_queue.cc as the per-call completion queue was not drained.
This PR added proper server shutdown when ClientController.new throws so the assertion will not be triggered. It also bumped the ClientController timeout so it's less likely ClientController.new will throw so the test will less likely to fail due to timeout.
PS: I tried to fix these issues by adding finalizer to rpc_server.rb to stop the server and wait the running thread to finish. However it didn't work, because when GC happens, the thread for server.run is already terminated and events left in the server call CQ cannot be processed.
@stanley-cheung @apolcyn
@veblush
@apolcyn Thank you for your comment. I tried, but it didn’t work.
I also tried to add a reference to of server to active call, have rb_call mark rb_server in the extension but they didn't prevent the server being collected I think it's because Ruby uses a mark-and-sweep GC.
What makes this problem hard to handle in ruby lib, in my opinion, is that each ruby server call has its own completion queue, the completion queues are polled by ruby threads, when GC happens, those threads are terminated, as a result when server is being collected, we shutdown the gRPC server, (unless we keep a set of all the call completion queues in the extension), we cannot properly drain those completion queues. I guess this why the maybe_shutdown_and_notify and maybe_destroy didn't work and my previous attempt with finalizers didn't work. If you have other ideas, I'd like to give it a try.
This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions!
This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions!