grpc-java icon indicating copy to clipboard operation
grpc-java copied to clipboard

server awaitTermination() doesn't handle graceful shutdown for open streams

Open o-shevchenko opened this issue 1 year ago • 6 comments

What version of gRPC-Java are you using?

1.63.0

What is your environment?

RHEL Docker image, JDK 17. We use https://github.com/grpc-ecosystem/grpc-spring, which uses awaitTermination to shut down the server gracefully.

What did you expect to see?

gRPC server supports grateful shutdown if we have open streams. We use gRPC streaming to read and write data via our microservice. We expect that we can utilize K8 graceful shutdown to postpone the pod kill process to finish read/write first and close all streams to don't close the connection.

What did you see instead?

Even if we configured graceful shutdown for gRPC server and K8s pod we still see that gRPC server is terminating immediately after SIGTERM even if we invoke awaitTermination().

Steps to reproduce the bug

  1. Run gRPC server in K8s pod
  2. Open gRPC stream and read data
  3. Trigger pod shutdown or just kill Java process. You can use kubectl delete pod or execute kill -TERM PID for Java process inside your pod (it should have PID 1 if you started your Java app as the main process)
  4. The shutdown hook is triggered, and we invoke awaitTermination(), but the gRPC server is terminated immediately even if we still read data via stream.

See issue: https://github.com/grpc-ecosystem/grpc-spring/issues/1110 See a similar problem described here: https://fedor.medium.com/shutting-down-grpc-services-gracefully-961a95b08f8

o-shevchenko avatar May 22 '24 13:05 o-shevchenko

Could you please try to reproduce this with v1.63.1 or v1.64.0? v1.63.0 contained a few bugs that were fixed in v1.64.0 and backported to v1.63.1: https://github.com/grpc/grpc-java/releases/tag/v1.63.1.

sergiitk avatar May 23 '24 16:05 sergiitk

Thanks for the reply @sergiitk ! Yes, I can reproduce it with 1.63.1 version as well

o-shevchenko avatar May 23 '24 16:05 o-shevchenko

Adding a shutdown hook that calls shutdown() and await termination() on GRPC server is the correct way to produce a graceful shutdown, as you have already elucidated. We have had discussions in the past on whether to provide this ability in the GRPC server but decided against it since we are a library, not a framework, and we don't control main.

kannanjgithub avatar May 24 '24 03:05 kannanjgithub

Thanks, @kannanjgithub. But I'm not sure if you understand the issue from the description. We already invoked await termination(), but it doesn't work as expected. It ignores open streams and just kills the server even if the client still reads data. We are forced to add additional logic to our shutdown hooks to check open streams for the server before invoking awaitTermination(). Could you comment if it's an expected behaviour? Thanks!

o-shevchenko avatar May 27 '24 08:05 o-shevchenko

Added more details @kannanjgithub :

  1. Run gRPC server in K8s pod
  2. Open gRPC stream and read data
  3. Trigger pod shutdown or just kill Java process. You can use kubectl delete pod or execute kill -TERM PID for Java process inside your pod (it should have PID 1 if you started your Java app as the main process)
  4. Shutdown hook is triggered, and we invoke awaitTermination(), BUT the gRPC server is terminated immediately even if we still read data via stream.

o-shevchenko avatar May 27 '24 08:05 o-shevchenko

We find it surprising that awaitTermination could have stopped working since it works in the examples code. Can you provide a test setup and share the GCP project with us to help debug the issue?

kannanjgithub avatar Jun 06 '24 15:06 kannanjgithub

I think I may know what's going on here. I think the last RPCs were cancelled (or deadline exceeded). gRPC then enqueued a callback to an executor and terminated because there were no more RPCs. But your application hasn't necessarily finished its processing in those callbacks.

The easiest way to solve this also follows a best-practice of providing a serverBuilder.executor() to gRPC to run callbacks so that you can limit the maximum number of threads. If you pass your own ExecutorService, then after gRPC's awaitTermination() returns true, you can wait for callbacks to complete.

// Just an example executor. gRPC uses Executor.newCachedThreadPool()
ExecutorService myExecutor = Executors.newFixedThreadPool(10);
Server server = ServerBuilder.forPort(blah)
  ...
  .executor(myExecutor)
  .build();

// In shutdown hook
server.shutdown();
server.awaitTermination(10, TimeUnit.SECONDS);
server.shutdownNow();
server.awaitTermination(10, TimeUnit.SECONDS);
// Now wait for all callbacks to complete. If you have server.awaitTermination()
// in your main(), you could do this there instead. It just needs to happen on a
// non-daemon thread.
myExecutor.shutdown();
myExecutor.awaitTermination(10, TimeUnit.SECONDS);

ejona86 avatar Nov 21 '24 05:11 ejona86

Seems like this is resolved. If not, comment, and it can be reopened.

ejona86 avatar Dec 09 '24 15:12 ejona86