bull icon indicating copy to clipboard operation
bull copied to clipboard

Sanboxed workers doesn't trigger `failed` events when the worker was killed.

Open kennethpdev opened this issue 4 years ago • 3 comments

Description

I have a worker in a separate process (sanboxed).

const worker = path.resolve(__dirname, 'worker.js');
queue.process(2, worker);

But whenever this worker terminates or force killed like a server stop/restarts the queue failed event never triggered and I always get the error below.

0|audio-worker  | /mnt/d/Git/audio-worker/node_modules/bull/lib/process/master.js:99
0|audio-worker  |   throw err;
0|audio-worker  |   ^
0|audio-worker  | Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed
0|audio-worker  |     at process.target.send (internal/child_process.js:636:16)
0|audio-worker  |     at Promise.resolve.then.err (/mnt/d/Git/audio-worker/node_modules/bull/lib/process/master.js:76:21)

sometimes its Error: write EPIPE

Different from when I choose not to make it a separate process.

const worker = require('./worker');
queue.process(2, worker);

This setup always receive the failed event and no error. Is this normal, like does sandboxed never trigger the failed event or something was wrong?

Minimal, Working Test code to reproduce the issue.

My worker is simple I can replicate it with this simple reject.

module.exports = async (job) => {
  try {
    const { id: jobId } = job;
    console.log('Job worker started', jobId);

    // simulated heavy process that takes 30seconds to complete
    await new Promise((resolve, reject) => {
      const makeError = () => {
        console.log('Killing job', jobId);
        reject(new Error('Error from long process'));
      };

      process.on('SIGTERM', makeError);
      process.on('SIGINT', makeError);

      setTimeout(resolve, 30000);
    });

    // success
    return Promise.resolve();
  } catch (err) {
    console.log('worker error', err.message);
    return Promise.reject(new Error("This should trigger the failed event"));
  }
};

for this particular test I use pm2 to start the app, and then pm2 stop <name> to properly trigger the SIGINT. But I think ctrl+c also works.

To replicate, wait for a few seconds after the log Job worker started. Then terminate the process.

Bull version

v3.13.0

Additional information

node v10.20.1 ioredis v4.16.2

kennethpdev avatar Apr 27 '20 10:04 kennethpdev

@kenma9123 Does the SIG event trigger if your process is not long running?

I'm seeing a similar behavior, but in my case the SIGTERM event handler gets fired only after the child processor completes its work. This effectively prevents any interruption of the child process while it's working, which makes it hard to implement graceful shutdown logic.

robolivable avatar Nov 20 '20 17:11 robolivable

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 12 '21 14:07 stale[bot]

We are also using PM2 as main process manager, that starts a sandboxed processor. When I do a pm2 restart all, the running processes wont be killed. IMO a restart should also kill the sub processes. (e.g. redeploy)

Our current code for that in the processor:

    // setup proper exit handling
    const shutdown = async () => {
      // console.log('shutdown', e);
      // ... do other tear down stuff in DB
      process.exit(410); // parent gone
    };
    process.on('SIG', shutdown);
    process.on('SIGTERM', shutdown);
    process.on('SIGINT', shutdown);

Currently SIGINT is being called when I stop PM2.

phips28 avatar Sep 07 '21 14:09 phips28