nest icon indicating copy to clipboard operation
nest copied to clipboard

Graceful shutdowns - WebSocketGateways ability to run async clean up logic per connection on shutdown

Open timbo-tj opened this issue 2 years ago • 4 comments
trafficstars

Is there an existing issue that is already proposing this?

  • [X] I have searched the existing issues

Is your feature request related to a problem? Please describe it

I have some logic that runs on handleDisconnect. When a user disconnects it scrubs some information from Redis and MongoDB .

I need both of these connections to remain open while I do my clean up work.

So I enable Shutdown hooks for graceful shutdowns. When server receives termination signal, it disconnects all users who are connected to the websocket server, which invokes handleDisconnect for all users. Perfect. But it doesn't 'wait' for my work to complete before it proceeds to shutdown the Redis and MongoDB connections. My handleDisconnect methods are async though this doesn't seem to help.

The end result is when a server shuts down, all users who are currently connected will 'leak' information into Redis and MongoDB as I didn't get a chance to clean it. My code is 'self-repairing' and this leaked information will eventually be cleared up, but this is just a safety-net and until the data is cleaned up there are odd quirks (like a user appears to be in a chat lobby but they are no longer there). I do not want to rely on this behaviour every time my server restarts (Which is atleast once a day as we host on Heroku)

Describe the solution you'd like

I'd like to be able to perform some async clean up work for each client that is being disconnected as a result of server shutdown. I still need access to external services like Redis and MongoDB.

Teachability, documentation, adoption, migration strategy

What is the motivation / use case for changing the behavior?

(This is a simplified example)

  • A WebSocketGateway chat server. Lobbies are managed on MongoDB
  • Users who connect are assigned a random lobby - the change is written to MongoDB
  • Users who disconnect are removed from their lobbies - the changes are written to MongoDB
  • With shutdown hooks/graceful shutdown enabled, the WebSocketGateway is able to disconnect each user one by one and perform the async disconnect operation on each one prior to termination.

timbo-tj avatar Mar 25 '23 02:03 timbo-tj

My current solution is as follows (I am testing it now, unsure if it will work in practice but will keep posted...):

  • custom adaptor class (RedisIoAdapter)
app.useWebSocketAdapter(new RedisIoAdapter(app, rediosIoAdaptorConfig))
  • Adaptors do not receive lifecycle events. I need to perform my clean up in beforeApplicationShutdown as this is the only time we have before everything gets disconnected (MongoDB etc) by NestJS.
  • Workaround... RedisModule has a list of RedisIoAdaptors.
  • RedisIoAdapter.ctor searches for RedisModule and add's itself to the list
    constructor(
        app: INestApplicationContext,
        private config: RedisIoAdaptorConfig
    ) {

        ....

        app.resolve(RedisModule).then((module) => {
            module.registerIoAdapter(this);
        });
    }
  • RedisModule implements beforeApplicationShutdown
  • RedisModule.beforeApplicationShutdown iterates list of RedisIoAdaptors and invokes beforeApplicationShutdown on each one... Workaround complete...
  • RedisIoAdapter.createIOServer stores a reference to all the servers it has created.

    private servers: Server[] = [];

    createIOServer(port: number, options?: ServerOptions): any {

        this.logger.log("Creating io server...");

        ...

        const server = super.createIOServer(port, options) as Server;
        server.adapter(this.adapterConstructor);
        
        ...

        this.servers.push(server);
        
        return server;
    }
  • RedisIoAdapter.beforeApplicationShutdown iterates each server and closes each one.
  • We just wait some amount of time to give our clean up tasks a chance to complete... (hack... Ideally the WebSocketServers could provide some way to signal that they are done cleaning up..)
    async beforeApplicationShutdown() {
        this.logger.log("Shutting down...");
        
        for(const server of this.servers){
            server.close();
        }

        //wait 5 seconds to allow handleDisconnect's to finish doing their work...
        await new Promise((resolve) => setTimeout(resolve, 5000));
    }

timbo-tj avatar Mar 25 '23 02:03 timbo-tj

The above approach works for my case, but is pretty hacky and mainly just a proof of concept. The biggest issue is there is no nice way to 'know' if all the WebSocketGateways that have clean up to do have finished cleaning up. In my proof of concept I just wait 5 seconds, but in the real solution it would be nice if the async clean up logic was just awaited.

timbo-tj avatar Mar 25 '23 03:03 timbo-tj

@timbo-tj Can you send a link to the repo or a repo with the issue replicated, so that I can take a look and replicate it and get a feel for what you are getting at?

josephaw1022 avatar Apr 06 '23 18:04 josephaw1022

Hey Joseph! Sorry I don't know how I missed your message. I will look into putting together a repo at some point, if I can find the time. Thanks!!

timbo-tj avatar Jul 31 '24 05:07 timbo-tj