grpc-perl icon indicating copy to clipboard operation
grpc-perl copied to clipboard

t/17-fork_friendliness.t hangs on Ubuntu20/aarch64

Open FGasper opened this issue 3 years ago • 3 comments

Output from the test suite when I try to install:

t/10-base_stub.t .............. ok
t/11-bidi_streaming_call.t .... ok
t/12-client_streaming_call.t .. ok
t/13-server_streaming_call.t .. ok
t/14-unary_call.t ............. ok
t/15-xs_end_to_end.t .......... 1/82 Expected hash for create_metadata_array() args at t/15-xs_end_to_end.t line 298.
Expected hash for create_metadata_array() args at t/15-xs_end_to_end.t line 341.
t/15-xs_end_to_end.t .......... ok
t/16-xs_secure_end_to_end.t ... 1/46 Expected hash for create_metadata_array() args at t/16-xs_secure_end_to_end.t line 270.
t/16-xs_secure_end_to_end.t ... ok
t/17-fork_friendliness.t ...... 1/?

FGasper avatar Mar 25 '22 17:03 FGasper

strace isn’t much more helpful:

$ sudo strace -fyy -p13835 -p13829
strace: Process 13835 attached
strace: Process 13829 attached
[pid 13829] wait4(13835,  <unfinished ...>
[pid 13835] futex(0xffff9eaaaca8, FUTEX_WAIT_PRIVATE, 2, NULL^Cstrace: Process 13835 detached
 <detached ...>
strace: Process 13829 detached

FGasper avatar Mar 25 '22 17:03 FGasper

Looks like the call to Grpc::XS::init() hangs.

FGasper avatar Mar 25 '22 17:03 FGasper

The hang is due to a race between grpc_shutdown() (via Grpc::XS::destroy()) and fork(). Prior to gRPC version 1.20 (https://github.com/grpc/grpc/releases/tag/v1.20.0), grpc_shutdown() was synchronous, i.e. it blocked until shutdown was complete. From 1.20 onwards, the shutdown happens on a freshly spawned thread, while grpc_shutdown() returns (see https://github.com/yang-g/grpc/blob/cedc76bf3833db276732e6ef0a0c5074d655f9ac/src/core/lib/surface/init.cc#L208). Thus, shutdown might still be in progress when fork occurs. If fork occurs without or before shutdown is complete, the library will be in a bad internal state in the child process. This can lead to a deadlock when the child calls Grpc::XS::destroy() (which usually happens automatically at exit).

A fix is to use grpc_shutdown_blocking() when available, which preserves the old behavior. I'll submit a PR that fixes this shortly.

Also, on the subject of fork-safety (mentioned in the README), as long as Grpc::XS::destroy() is called and all client objects deleted/freed before fork(), usage of the library is safe in both parent and child, both before and after fork. Naturally, both processes must (re)initialize the library after the fork. I have stress tested this extensively, but unfortunately not in code I'm able to share.

laustbn avatar Oct 31 '23 16:10 laustbn