pg-boss icon indicating copy to clipboard operation
pg-boss copied to clipboard

Two phase shutdown process

Open danilofuchs opened this issue 3 months ago • 1 comments

Our production application has 3 main entrypoints:

  1. HTTP Routes (Koa)
  2. Jobs/Crons (pg-boss)
  3. Events (event-emitter2)

All within the same process in a Docker container.

When shutting down the application gracefully, we devised three steps:

  1. Stop route handlers and job workers and drain them
  2. Drain event listeners
  3. Close connections to application DB, Redis, Sentry

Route handlers, job workers and event listeners may want to publish jobs or dispatch events at any time. As the event bus is in-memory, it needs to be available until all handlers/workers are drained.

Currently we use pg-boss stop({ wait: true, close: true, graceful: true })

This causes issues if, on step 2, an event listener wants to publish a job. Then, the pg-boss DB connection is already closed.

Proposed solution

Expose a drain(timeout) method that stops workers (internally offWork() + wait with timeout) but keeps the DB open so new jobs may be published.

Allow stop() to stop the DB connection even if the worker is already stopped.

Considered alternatives

We are going with alternative 3 for now, but would love to have your thoughts on this workflow!

1. Calling stop 2 times

// Drain
    await this.boss.stop({
      close: false, // Keep DB connection open so we can still add jobs
      wait: true,
      graceful: true,
      timeout: timeoutMs,
    });
// Close connections
    await this.boss.stop({
      close: true,
      wait: true,
      graceful: true,
      timeout: timeoutMs,
    });

This does not work as stop checks if the this.stopped flag is set and short-circuits

2. Stopping via getDb().stop()

// Drain
    await this.boss.stop({
      close: false, // Keep DB connection open so we can still add jobs
      wait: true,
      graceful: true,
      timeout: timeoutMs,
    });
await this.boss.getDb().stop();

The issue here is that the Db interface should not expose stop() as it is an implementation detail. We could patch it on our side but seems shady.

3. Implement custom Db

Copy the pg-boss implementation of Db to our code and manage the pool manually

4. offWork() + stop()

Call offWork() with each queue name (step 1) and then stop() after event listeners drain (step 3)

New jobs won't be processed during shutdown, but does not wait for current jobs to drain. This means these jobs may depend on the event listener which is shutdown.

5. Custom worker tracking

We could track running jobs on our side and offWork/wait for them.

danilofuchs avatar Sep 10 '25 20:09 danilofuchs

I think the most straight-forward option is

Allow stop() to stop the DB connection even if the worker is already stopped.

I think the easiest way to build the logic would be to add another condition to consider it's state (if it's open, let a close: true option continue).

timgit avatar Sep 12 '25 20:09 timgit