dockerode icon indicating copy to clipboard operation
dockerode copied to clipboard

Race condition in docker.start

Open ckanibal opened this issue 4 years ago • 1 comments

First of all, Thank You! for this amazing project.

When I try to run a short running operation in a new container with the HostOption AutoRemove: true like this:

docker.run(options.image,
        [command, ...args],
        [stdout, stderr],
        {
            Tty: false,
            HostConfig: {
                AutoRemove: true,
                ...(options.customCwd && [`${options.customCwd}:${options.customCwd}:rw`])
            },
        })
        .then(([res, container]) => {
            logger.debug("Docker run successful", {res, container});
            return {
                code: res.StatusCode,
                okToCache,
                filenameTransform,
                stdout: stdout.toString(),
                stderr: stderr.toString(),
            };
        })
        .catch(error => {
            logger.warn("Docker run failed", {error});
            return error;
        });

it occasionally fails with the error message:

Error: (HTTP code 404) no such container - No such container: 919455c4b42dd72b6767c129772a134616d6d8d8e7b54e9f947a0d92b74f2fa2 
    at /project/node_modules/docker-modem/lib/modem.js:301:17
    at getCause (/project/node_modules/docker-modem/lib/modem.js:331:7)
    at Modem.buildPayload (/project/node_modules/docker-modem/lib/modem.js:300:5)
    at IncomingMessage.<anonymous> (/project/node_modules/docker-modem/lib/modem.js:275:14)
    at IncomingMessage.emit (events.js:333:22)
    at IncomingMessage.EventEmitter.emit (domain.js:485:12)
    at endReadableNT (_stream_readable.js:1220:12)
    at processTicksAndRejections (internal/process/task_queues.js:84:21)

I tracked the issue down to those lines of code: https://github.com/apocas/dockerode/blob/ed6ef39e0fc81963fedf208c7e0854d8a44cb9a8/lib/docker.js#L1486-L1494 The wait operation can fail, if the container is already finished and automatically removed between the start and wait operation. I have no good idea how to mitigate this issue, yet. Somebody care to take a look?

ckanibal avatar Jul 21 '20 15:07 ckanibal

Interesting. But I don't see an easy fix for this one.

Checking the container's status will not help. We can't assume the container finished correctly after a very fast lifecycle, something could went bad.

apocas avatar Apr 05 '21 12:04 apocas