garnet icon indicating copy to clipboard operation
garnet copied to clipboard

Implement graceful shutdown for Garnet server

Open yuseok-kim-edushare opened this issue 1 month ago • 0 comments

This pull request introduces a comprehensive and robust graceful shutdown mechanism for the Garnet server, both for Windows service and console application scenarios. The changes ensure that when a shutdown is requested (via service stop, Ctrl+C, or process exit), the server stops accepting new connections, waits for existing connections to finish, commits all data (AOF), and takes a checkpoint if necessary, all within a configurable timeout. This significantly improves data durability and operational reliability during shutdowns.

Mainly goal is Close #1382 and Resolve #1390 This PR reflect https://github.com/microsoft/garnet/pull/1383#discussion_r2535724513

Graceful Shutdown Implementation

  • Added a new ShutdownAsync method to main/GarnetServer/GarnetServer that orchestrates the graceful shutdown process: stops accepting new connections, waits for active connections to finish (with timeout), commits AOF, and takes a checkpoint if tiered storage is enabled.
  • Modified the Windows service (Worker.StopAsync) and console app (Program.Main) to use the new ShutdownAsync method, ensuring consistent and graceful shutdown behavior in both entrypoints. [1] [2]

Server Interface and Networking Enhancements

  • Extended the server interface (libs/server/Servers/IGarnetServer) and base classes to support stopping listening for new connections via a new StopListening method, and implemented this for TCP servers to close the listen socket cleanly. [1] [2] [3]

Data Durability and Checkpointing

  • Added new APIs in libs/server/Servers/StoreApi to take a checkpoint and to check if AOF or storage tier is enabled, supporting the shutdown flow for data durability.

Infrastructure and Code Quality Improvements

  • Ensured proper disposal patterns and resource cleanup, including calling base.Dispose() and suppressing finalization.
  • Updated using directives and minor code structure for clarity and consistency. [1] [2] [3]

Configuration and Timeout Handling

  • Added configuration for shutdown timeout in the Windows service host, defaulting to 5 seconds for graceful shutdown.

These changes collectively make server shutdowns safer and more reliable, reducing the risk of data loss or corruption during restarts or deployments.

yuseok-kim-edushare avatar Nov 26 '25 15:11 yuseok-kim-edushare