rancher-desktop icon indicating copy to clipboard operation
rancher-desktop copied to clipboard

Long wait and network errors on npm install in Rancher container

Open hdodov opened this issue 1 year ago • 15 comments

Actual Behavior

Running npm ci with a large number of packages results in over 5 minutes of wait and EHOSTUNREACH or EAI_AGAIN errors.

Steps to Reproduce

  1. Clone the reproduction repo: https://github.com/hdodov/mcve-rancher-npm-hang
  2. Run npm run dev
  3. You should see one of the errors I got

Result

First attempt:

[+] Building 308.6s (8/9)                                                               docker:default
 => [payload internal] load build definition from Dockerfile                                      0.0s
 => => transferring dockerfile: 182B                                                              0.0s
 => [payload internal] load .dockerignore                                                         0.0s
 => => transferring context: 2B                                                                   0.0s
 => [payload internal] load metadata for docker.io/library/node:18-bullseye-slim                  2.6s
 => [payload internal] load build context                                                         0.0s
 => => transferring context: 72B                                                                  0.0s
 => [payload 1/5] FROM docker.io/library/node:18-bullseye-slim@sha256:f06f9ff9dcc5cfec936a7bf085  0.0s
 => CACHED [payload 2/5] WORKDIR /app                                                             0.0s
 => CACHED [payload 3/5] COPY package-lock.json .                                                 0.0s
 => CACHED [payload 4/5] COPY package.json .                                                      0.0s
 => [payload 5/5] RUN npm ci                                                                    306.0s
failed to solve: Unavailable: error reading from server: EOF

Second attempt:

[+] Building 624.4s (9/9) FINISHED                                                      docker:default
 => [payload internal] load .dockerignore                                                         0.0s
 => => transferring context: 2B                                                                   0.0s
 => [payload internal] load build definition from Dockerfile                                      0.0s
 => => transferring dockerfile: 182B                                                              0.0s
 => [payload internal] load metadata for docker.io/library/node:18-bullseye-slim                  3.4s
 => [payload 1/5] FROM docker.io/library/node:18-bullseye-slim@sha256:f06f9ff9dcc5cfec936a7bf085  0.0s
 => [payload internal] load build context                                                         0.0s
 => => transferring context: 72B                                                                  0.0s
 => CACHED [payload 2/5] WORKDIR /app                                                             0.0s
 => CACHED [payload 3/5] COPY package-lock.json .                                                 0.0s
 => CACHED [payload 4/5] COPY package.json .                                                      0.0s
 => ERROR [payload 5/5] RUN npm ci                                                              558.7s
------                                                                                                 
 > [payload 5/5] RUN npm ci:                                                                           
549.4 npm WARN deprecated [email protected]: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
549.9 npm WARN deprecated [email protected]: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
558.5 npm notice 
558.5 npm notice New minor version of npm available! 10.2.3 -> 10.3.0
558.5 npm notice Changelog: <https://github.com/npm/cli/releases/tag/v10.3.0>
558.5 npm notice Run `npm install -g [email protected]` to update!
558.5 npm notice 
558.5 npm ERR! code EAI_AGAIN
558.5 npm ERR! syscall getaddrinfo
558.5 npm ERR! errno EAI_AGAIN
558.5 npm ERR! request to https://registry.npmjs.org/test-exclude/-/test-exclude-6.0.0.tgz failed, reason: getaddrinfo EAI_AGAIN registry.npmjs.org

Here's an error I got when manually entering the container shell and running npm ci:

root@fe61caa23d72:/app# npm ci
npm WARN deprecated [email protected]: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated [email protected]: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated @types/[email protected]: This is a stub types definition. sharp provides its own type definitions, so you do not need this installed.
npm notice 
npm notice New minor version of npm available! 10.2.3 -> 10.3.0
npm notice Changelog: https://github.com/npm/cli/releases/tag/v10.3.0
npm notice Run npm install -g [email protected] to update!
npm notice 
npm ERR! code EHOSTUNREACH
npm ERR! syscall connect
npm ERR! errno EHOSTUNREACH
npm ERR! request to https://registry.npmjs.org/async/-/async-2.6.4.tgz failed, reason: connect EHOSTUNREACH 104.16.25.34:443

This is the complete npm log of this run: npm-EHOSTUNREACH.log.

However, when I manually ran it later, it worked:

root@5e7e6a9d2f90:/app# npm ci
npm WARN deprecated @types/[email protected]: This is a stub types definition. sharp provides its own type definitions, so you do not need this installed.
npm WARN deprecated [email protected]: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated [email protected]: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.

added 1423 packages, and audited 1424 packages in 43s

224 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

Expected Behavior

The whole container build should take about 50 seconds, as it does with OrbStack and Docker Desktop.

Additional Information

No response

Rancher Desktop Version

1.12.1

Rancher Desktop K8s Version

deactivated

Which container engine are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

macOS Sonoma 14.3

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

No response

hdodov avatar Jan 24 '24 15:01 hdodov

This might be related to #6358, which also appears to occur due to network errors in the container.

hdodov avatar Jan 24 '24 15:01 hdodov

I can reproduce this behaviour with QEMU. I tried giving the VM more resources by running with 8 CPUs and 16GB of memory, but the result was unchanged.

It seems to work fine when using VZ emulation (and virtiofs for the volume sharing), but the build tooks still over 150s instead of the expected 50s:

$ npm run dev

> dev
> docker compose up --build

[+] Building 153.2s (11/11) FINISHED                                                                   docker:rancher-desktop
 => [payload internal] load .dockerignore                                                                                0.0s
 => => transferring context: 2B                                                                                          0.0s
 => [payload internal] load build definition from Dockerfile                                                             0.0s
 => => transferring dockerfile: 219B                                                                                     0.0s
 => [payload internal] load metadata for docker.io/library/node:18-bullseye-slim                                         0.9s
 => [payload auth] library/node:pull token for registry-1.docker.io                                                      0.0s
 => [payload internal] load build context                                                                                0.0s
 => => transferring context: 146B                                                                                        0.0s
 => [payload 1/5] FROM docker.io/library/node:18-bullseye-slim@sha256:f06f9ff9dcc5cfec936a7bf08580c5d9cf0f424fbbd857d36  0.0s
 => CACHED [payload 2/5] WORKDIR /app                                                                                    0.0s
 => CACHED [payload 3/5] COPY package-lock.json .                                                                        0.0s
 => CACHED [payload 4/5] COPY package.json .                                                                             0.0s
 => [payload 5/5] RUN npm ci                                                                                           149.7s
 => [payload] exporting to image                                                                                         2.6s
 => => exporting layers                                                                                                  2.6s
 => => writing image sha256:7dfde214f314799120a5e9894e9188db49ce7130b2f5077a070eab65ffc8e0cd                             0.0s
 => => naming to docker.io/library/mcve-rancher-npm-hang-payload                                                         0.0s
[+] Running 2/2
 ✔ Network mcve-rancher-npm-hang_default      Created                                                                    0.0s
 ✔ Container mcve-rancher-npm-hang-payload-1  Created                                                                    0.0s
Attaching to payload-1

jandubois avatar Jan 24 '24 23:01 jandubois

I can reproduce this behaviour with QEMU. I tried giving the VM more resources by running with 8 CPUs and 16GB of memory, but the result was unchanged.

I forgot to mention: the Docker socket forwarding seems to have stopped while running npm run dev with QEMU, so the result is essentially just a timeout after 5min.

I could still get a shell inside the VM, and the moby engine was still running, but the image had not yet been built.

jandubois avatar Jan 24 '24 23:01 jandubois

It looks like buildkit is not properly enabled by default in Rancher Desktop; enabling it explicitly in the compose file brings the build time down to 31s (after a factory reset, but again using VZ and virtiofs):

--- docker-compose.yml
+++ docker-compose.yml
@@ -3,5 +3,7 @@ services:
   payload:
     build:
       context: ./
+      args:
+        DOCKER_BUILDKIT: 1
     ports:
       - 3000:3000

and

$ npm run dev

> dev
> docker compose up --build

[+] Building 37.7s (11/11) FINISHED                                                                    docker:rancher-desktop
 => [payload internal] load .dockerignore                                                                                0.0s
 => => transferring context: 2B                                                                                          0.0s
 => [payload internal] load build definition from Dockerfile                                                             0.0s
 => => transferring dockerfile: 219B                                                                                     0.0s
 => [payload internal] load metadata for docker.io/library/node:18-bullseye-slim                                         1.6s
 => [payload auth] library/node:pull token for registry-1.docker.io                                                      0.0s
 => [payload internal] load build context                                                                                0.0s
 => => transferring context: 1.11MB                                                                                      0.0s
 => [payload 1/5] FROM docker.io/library/node:18-bullseye-slim@sha256:f06f9ff9dcc5cfec936a7bf08580c5d9cf0f424fbbd857d36  2.3s
 => => resolve docker.io/library/node:18-bullseye-slim@sha256:f06f9ff9dcc5cfec936a7bf08580c5d9cf0f424fbbd857d3677cff575  0.0s
 => => sha256:5ae20c369ec182f5275f874054411f9bfb3f8132ff52a9c1d08d4ae15494d01c 30.06MB / 30.06MB                         0.8s
 => => sha256:01274fed796f4fcde21eb139c21923f0577bf03c6aca3de5fb6cca7997c026c7 4.19kB / 4.19kB                           0.2s
 => => sha256:66f24c6ac6abeb092db3801126aeedab6af5fb7289ad2c67fdf187e36b658e1d 38.67MB / 38.67MB                         0.7s
 => => sha256:f06f9ff9dcc5cfec936a7bf08580c5d9cf0f424fbbd857d3677cff575ff340db 1.21kB / 1.21kB                           0.0s
 => => sha256:9440185cd6578a314ca5d32b315bea09d55c2050442590ace64c121170c009e8 1.37kB / 1.37kB                           0.0s
 => => sha256:5b233b1f1a3b8a53bebaf24ca9d537508f23e4a1f5bf6d77e5aa334e8b42a7fc 7.64kB / 7.64kB                           0.0s
 => => sha256:e8d32e99fce4825fc4fa6badbf7514943681c2fc1bcdab3f10c0b3047541122b 2.70MB / 2.70MB                           0.5s
 => => sha256:1a8818020ea28916649820eb93d259bce8ee9113797d5d2dbbbc6378612fabe1 448B / 448B                               0.6s
 => => extracting sha256:5ae20c369ec182f5275f874054411f9bfb3f8132ff52a9c1d08d4ae15494d01c                                0.6s
 => => extracting sha256:01274fed796f4fcde21eb139c21923f0577bf03c6aca3de5fb6cca7997c026c7                                0.0s
 => => extracting sha256:66f24c6ac6abeb092db3801126aeedab6af5fb7289ad2c67fdf187e36b658e1d                                0.8s
 => => extracting sha256:e8d32e99fce4825fc4fa6badbf7514943681c2fc1bcdab3f10c0b3047541122b                                0.1s
 => => extracting sha256:1a8818020ea28916649820eb93d259bce8ee9113797d5d2dbbbc6378612fabe1                                0.0s
 => [payload 2/5] WORKDIR /app                                                                                           0.1s
 => [payload 3/5] COPY package-lock.json .                                                                               0.0s
 => [payload 4/5] COPY package.json .                                                                                    0.0s
 => [payload 5/5] RUN npm ci                                                                                            31.0s
 => [payload] exporting to image                                                                                         2.7s
 => => exporting layers                                                                                                  2.7s
 => => writing image sha256:7e939011152059397661a7baef53465e78b9132d2f0593b2852cecbee51da2cf                             0.0s
 => => naming to docker.io/library/mcve-rancher-npm-hang-payload                                                         0.0s
[+] Running 2/2
 ✔ Network mcve-rancher-npm-hang_default      Created                                                                    0.0s
 ✔ Container mcve-rancher-npm-hang-payload-1  Created                                                                    0.0s
Attaching to payload-1
^CGracefully stopping... (press Ctrl+C again to force)

jandubois avatar Jan 24 '24 23:01 jandubois

I did one more test with VZ emulation, but still using reverse-sshfs for the volume sharing. It worked the same way as using virtiofs, so the problem seems to be due to QEMU.

In summary there are 2 issues:

  • the docker socket forwarding gets disconnected with QEMU (breaking)
  • buildkit is not configured correctly to be used by default (slowdown)

The workaround for now is to use VZ emulation and configure buildkit explicitly.

jandubois avatar Jan 24 '24 23:01 jandubois

Let's use #1541 to deal with the buildkit configuration and keep this issue for dealing with the broken docker.sock when using QEMU.

jandubois avatar Jan 29 '24 18:01 jandubois

It looks like buildkit is not properly enabled by default in Rancher Desktop;

This turned out to be wrong. The reason that buildkit was not being used was because I still had a custom builder installed from investigating a different issue, and that builder was also made the default:

$ docker buildx ls
NAME/NODE         DRIVER/ENDPOINT  STATUS  BUILDKIT        PLATFORMS
amd64builder *    docker-container
  amd64builder0   rancher-desktop  running v0.12.5         linux/arm64, linux/amd64, linux/amd64/v2
default           docker
  default         default          running v0.10.6+d52b2d5 linux/arm64, linux/amd64, linux/amd64/v2
rancher-desktop   docker
  rancher-desktop rancher-desktop  running v0.10.6+d52b2d5 linux/arm64, linux/amd64, linux/amd64/v2

The builder config is not deleted by a factory reset (I've created #6408 to change this), so it affected my testing for this issue as well by running the build inside a container.

After deleting the custom builder, the docker compose up --build worked normally and compled the npm ci command in under 40s.

Deleting the builder did not help with the QEMU issue though; it would still lose the socket connection and eventually time out.

jandubois avatar Feb 01 '24 19:02 jandubois

I am having the same issue. Using QEMU the my Docker crashes. Using VZ at least install finishes but it is taking a very long time. Running with npm install --loglevel verbose i can see that the HTTP GET calls are taking progresivelly longer ang longer (eventually each GET is more than 1 minute).

thecopy avatar Feb 17 '24 09:02 thecopy

I'm having this issue using Mac Os M1 Sonoma 14.1.1. Rancher Desktop using QEMU virtualization used to work 😢

Eddcp avatar Feb 21 '24 02:02 Eddcp

I'm having the same issue on Mac OS 14.4.1 with intel core. Earlier docker build with npm install used to work.

containeers avatar Apr 02 '24 17:04 containeers

Is there any update here? We too are seeing users running into this issue where with Docker Desktop the builds would be 1/10 the time.

rchaudha avatar Apr 19 '24 19:04 rchaudha

Is there any update here?

No update, unfortunately. Nobody has had time to work on it yet.

jandubois avatar Apr 19 '24 23:04 jandubois

@jandubois Downgrading Rancher Desktop will not solve the problem. A factory reset led me to the same problem and critical blocked. The Rancher desktop is not linked to the strict tested version of Qemu, but on factory reset it downloads the latest?

Intel Mac i7, MacOs Monterey 12.7.4

nino-magazinovic avatar May 09 '24 11:05 nino-magazinovic