docker-node icon indicating copy to clipboard operation
docker-node copied to clipboard

npm hangs on linux/s390x containers

Open hardillb opened this issue 1 year ago • 22 comments

Environment

  • Platform: linux/s390x
  • Docker Version: 24.0.6, build ed223bc
  • Node.js Version: 18
  • Image Tag:18-alpine

Expected Behavior

npm install runs and packages are installed.

Current Behavior

Trying to build a container on the linux/s309x platform hangs running npm install with npm consuming 100% CPU.

Previous builds complete in less than 5mins, current build has been running for over an hour

We are building the https://github.com/node-red/node-red-docker container with

docker buildx build --platform linux/s390x --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 .

Possible Solution

Steps to Reproduce

  • Check out https://github.com/node-red/node-red-docker
  • cd node-red-docker
  • run docker buildx build --platform linux/s390x --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 .

Additional Information

Same thing is happening with 14-alpine and 16-alpine tags

I'm hitting this both locally and in a GH Action, both of which use Qemu to support building for alternate architectures.

hardillb avatar Oct 04 '23 13:10 hardillb

I have similar issue (see Dockerfile).

I wonder whether the problem of #1798 and #1829 finally snuck into 18 and earlier images.

tyranron avatar Oct 04 '23 13:10 tyranron

Interesting. I've just fired up the docker image (node:16-alpine and node:18-alpine) on a real s390x system and npm seems to install without any problems. Which would lead us to perhaps something specific to qemu or the docker version in use (Mine is Docker version 24.0.5, build 24.0.5-0ubuntu1~22.04.1)

sxa avatar Oct 04 '23 13:10 sxa

Just tried with your dockerfile - went through without problems: build18.log.gz Command: docker build --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 . 2>&1 | tee build18.log

sxa avatar Oct 04 '23 14:10 sxa

Which does appear to point to this possibly being a qemu based problem. I know my laptop got a recent set of qemu packages, but not sure what would be needed to debug this. Any pointers would be helpful

hardillb avatar Oct 04 '23 14:10 hardillb

@hardillb setup-qemu-action uses onistiigi/binfmt Docker image for installing QEMU binaries. I think other versions like 6.1.0 or master could be tried to "resolve" this at least on GitHub Actions.

tyranron avatar Oct 04 '23 14:10 tyranron

master doesn't appear to fix it for me, testing 6.1.0

hardillb avatar Oct 04 '23 15:10 hardillb

no joy with qemu-v6.1.0 either so this may be a NodeJS + Qemu issue

hardillb avatar Oct 04 '23 16:10 hardillb

OK, while this appears to be limited to when running builds using qemu, this is going to be the default way 99% of CI builds run that target s390x, so I think we still need to track this down, even if it's just to raise a sensible upstream issue against qemu.

  • What debug options can I enable to try and get some useful debug information here?
  • Would trying to connect GDB to the spinning process help?

hardillb avatar Oct 05 '23 08:10 hardillb

@hardillb seems like after moby/buildkit#1516 we may omit using setup-qemu-action, because BuildKit supports QEMU emulation out-of-the-box. Even more, judging by onistiigi/binfmt Docker image tags, newer version of QEMU are released for buildkit- images only. The last one is 7.1.0.

However, for my repository the result is still the same, no matter which version is used: 6.0.0, 6.1.0, 6.2.0, 7.0.0, 7.1.0 or master.

tyranron avatar Oct 05 '23 09:10 tyranron

@hardillb in my case, the problem seems to be related to Linux only, somehow. I was able to resolve the issue just by switching to macos-latest runner for archs where the build stucks.

I will try this workaround for #1798 too, and will report the results.

tyranron avatar Oct 05 '23 14:10 tyranron

@tyranron did you get any joy using the docker.io/ prefix on the base containers?

If it is the qemu but https://gitlab.com/qemu-project/qemu/-/issues/1729 then hopefully it gets fixed soon.

hardillb avatar Oct 16 '23 09:10 hardillb

@hardillb

did you get any joy using the docker.io/ prefix on the base containers?

These are the same images, no?

I will try this workaround for https://github.com/nodejs/docker-node/issues/1798 too, and will report the results.

Building under macos-latest runner didn't work out for Node.js 20, but for 18 it fixed my problem.

tyranron avatar Oct 16 '23 10:10 tyranron

This may not be the same as the other qemu bug as it's not calling mremap.

I ran the following command:

docker run --platform linux/s390x -it --cap-add=SYS_PTRACE -e QEMU_STRACE=true -e QEMU_LOG_FILENAME=qemu.log -v ./qemu.log:/qemu.log --rm node:18-alpine npm install node-red:3.1.0

and got the following strace:

1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62cd8) = 0 ({tv_sec = 2223689,tv_nsec = 708242416})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62db8) = 0 ({tv_sec = 2223689,tv_nsec = 708269945})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62db8) = 0 ({tv_sec = 2223689,tv_nsec = 708530498})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62eb0) = 0 ({tv_sec = 2223689,tv_nsec = 708556031})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62eb0) = 0 ({tv_sec = 2223689,tv_nsec = 708593060})
1 munmap(0x00000040101ef000,57344) = 0
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b63028) = 0 ({tv_sec = 2223689,tv_nsec = 708653152})
1 socket(PF_NETLINK,SOCK_RAW|SOCK_CLOEXEC,NETLINK_ROUTE) = 24
1 sendto(24,275007277688,20,0,0,0) = 20
1 recvfrom(24,275007277688,8192,64,0,0) = 2880

qemu.log

hardillb avatar Oct 17 '23 09:10 hardillb

This looks to be spinning trying to receive data from the network. How do we move this forward?

hardillb avatar Oct 19 '23 16:10 hardillb

Due to https://github.com/tonistiigi/binfmt/pull/120 we have QEMU 8.0 in onistiigi/binfmt:master Docker image now. Tried it with node:21 Docker image, and still no luck.

tyranron avatar Oct 25 '23 14:10 tyranron

I started seeing this issue on September 19th, 2023.

I created a repo to help diagnose the problem, or to detect when a fix is made upstream. It runs daily tests on two versions of node across six architectures on Debian and Alpine. It simply attempts npm -v.

On Nov 7: 4 of the 12 Alpine combinations are failing.

Daily test status:

See:

  • https://github.com/felddy/npm-hang-test
  • https://github.com/felddy/npm-hang-test/issues/2

felddy avatar Nov 07 '23 20:11 felddy

report the similar in ticket #1946

ozbillwang avatar Nov 15 '23 02:11 ozbillwang

I've been playing with this again (as it's still a problem). I've been using AWS EC2 machines to try out a few different options.

  • It fails on Both Intel and AMD based x86_64 hardware
  • It fails on AWS Arm64 hardware as well
  • It fails with the latest 8.0.6 qemu builds (as provided by the qemu-v8.0.4 tag of tonistiigi/binfmt
  • I've tried Ubuntu 22.04 and 23.10 base OS builds

hardillb avatar Dec 26 '23 22:12 hardillb

I tried to run it on ubuntu-20.04 s390x and it works fine, but arm/v6 and arm/v7 still don't work, only alpine3.18 and nodejs18. https://github.com/whyour/qinglong/actions/runs/7375782137/job/20067750407

whyour avatar Jan 01 '24 07:01 whyour

I tried to reproduce the problem on my macbook and it seems to be working for me: FYI this is what I get:

mac-jan:tmp jan$ git clone https://github.com/node-red/node-red-docker
Cloning into 'node-red-docker'...
remote: Enumerating objects: 3154, done.
remote: Counting objects: 100% (225/225), done.
remote: Compressing objects: 100% (107/107), done.
remote: Total 3154 (delta 133), reused 197 (delta 118), pack-reused 2929
Receiving objects: 100% (3154/3154), 823.97 KiB | 1.99 MiB/s, done.
Resolving deltas: 100% (1988/1988), done.
mac-jan:tmp jan$ ls
node-red-docker
mac-jan:tmp jan$ cd node-red-docker/
mac-jan:node-red-docker jan$ docker buildx build --platform linux/s390x --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 .
[+] Building 331.6s (20/20) FINISHED                                                                                                                 docker-container:build
 => [internal] load build definition from Dockerfile.alpine                                                                                                            0.1s
 => => transferring dockerfile: 3.55kB                                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/node:18-alpine                                                                                                      4.0s
 => [auth] library/node:pull token for registry-1.docker.io                                                                                                            0.0s
 => [internal] load .dockerignore                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                        0.0s
 => [base  1/11] FROM docker.io/library/node:18-alpine@sha256:b1a0356f7d6b86c958a06949d3db3f7fb27f95f627aa6157cb98bc65c801efa2                                        18.3s
 => => resolve docker.io/library/node:18-alpine@sha256:b1a0356f7d6b86c958a06949d3db3f7fb27f95f627aa6157cb98bc65c801efa2                                                0.0s
 => => sha256:8f566b0cf37515471460f9658e9c86f83bf2350169945c1f2b328eec90ccac61 449B / 449B                                                                             2.0s
 => => sha256:743d88e4fdd9423c2822f8530204bc76ce14cdcfc62b97fe81bb5a6115485080 2.34MB / 2.34MB                                                                        18.0s
 => => sha256:9d91f34cd4b1eccf088eefcf662f235b2c0ae325b9739b7f7f7e875c25ba8643 41.11MB / 41.11MB                                                                      11.1s
 => => sha256:0fca3ee44ced87b7184bc23390283fdf10cfae0e844a25b785dd11c463815227 3.24MB / 3.24MB                                                                         3.3s
 => => extracting sha256:0fca3ee44ced87b7184bc23390283fdf10cfae0e844a25b785dd11c463815227                                                                              0.2s
 => => extracting sha256:9d91f34cd4b1eccf088eefcf662f235b2c0ae325b9739b7f7f7e875c25ba8643                                                                              2.3s
 => => extracting sha256:743d88e4fdd9423c2822f8530204bc76ce14cdcfc62b97fe81bb5a6115485080                                                                              0.1s
 => => extracting sha256:8f566b0cf37515471460f9658e9c86f83bf2350169945c1f2b328eec90ccac61                                                                              0.0s
 => [internal] load build context                                                                                                                                      0.1s
 => => transferring context: 7.81kB                                                                                                                                    0.0s
 => [base  2/11] COPY .docker/scripts/*.sh /tmp/                                                                                                                       0.0s
 => [base  3/11] COPY .docker/healthcheck.js /                                                                                                                         0.0s
 => [base  4/11] RUN set -ex &&     apk add --no-cache         bash         tzdata         iputils         curl         nano         git         openssl         open  8.1s
 => [base  5/11] WORKDIR /usr/src/node-red                                                                                                                             0.0s 
 => [base  6/11] COPY .docker/known_hosts.sh .                                                                                                                         0.0s 
 => [base  7/11] RUN ./known_hosts.sh /etc/ssh/ssh_known_hosts && rm /usr/src/node-red/known_hosts.sh                                                                 71.6s 
 => [base  8/11] RUN echo "PubkeyAcceptedKeyTypes +ssh-rsa" >> /etc/ssh/ssh_config                                                                                     0.2s 
 => [base  9/11] COPY package.json .                                                                                                                                   0.0s 
 => [base 10/11] COPY flows.json /data                                                                                                                                 0.1s 
 => [base 11/11] COPY .docker/scripts/entrypoint.sh .                                                                                                                  0.1s 
 => [build 1/1] RUN apk add --no-cache --virtual buildtools build-base linux-headers udev python3 &&     npm install --unsafe-perm --no-update-notifier --no-audit   178.4s 
 => [release 1/3] COPY --from=build /usr/src/node-red/prod_node_modules ./node_modules                                                                                 0.8s 
 => [release 2/3] RUN chown -R node-red:root /usr/src/node-red &&     /tmp/install_devtools.sh &&     rm -r /tmp/*                                                    41.5s 
 => [release 3/3] RUN npm config set cache /data/.npm --global                                                                                                         7.3s 
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load                                                                                                                                                      
mac-jan:node-red-docker jan$

FYI My macbook docker setup:

1/ I have installed lima (so I don't use docker desktop)

# install lima
brew install lima

# create default lima instance with 6GB memory using docker template
limactl start --name=default --set='.cpus = 4 | .memory = "6GiB" | .disk = "100GiB" ' template://docker

# create docker context - note that the actual unix socket path is returned by the previous command.
docker context create colima --docker "host=unix:///Users/jan/.lima/default/sock/docker.sock"
colima"

# starts the docker environment on my macbook.
limactl start

2/ I have installed Docker Buildx as follows:

# in folder /Users/jan/.docker/cli-plugins
wget https://github.com/docker/buildx/releases/download/v0.10.3/buildx-v0.10.3.darwin-amd64
mv buildx-v0.10.3.darwin-amd64 docker-buildx
chmod a+x docker-buildx

Add binfmt_misc support for additional platforms as specified in https://docs.docker.com/build/building/multi-platform/

 docker run --privileged --rm tonistiigi/binfmt --install all

janvda avatar Jan 03 '24 14:01 janvda

With https://github.com/tonistiigi/binfmt/pull/144 (QEMU 8.1.4) and node:21 it still doesn't work for me on arm32v6, arm32v7 and s390x platforms. Tried building on both macos-latest and ubuntu-latest runners:

  • https://github.com/instrumentisto/haraka-docker-image/commit/8058decb6aa614e214ff336d3360c8c3b493c5b3
  • https://github.com/instrumentisto/haraka-docker-image/commit/c3ef1bd79c4d94223bd7f8862be15060540fc417

tyranron avatar Jan 05 '24 14:01 tyranron

Also the important place to test this is in AMD64 hardware as this needs to run on GH actions with the Ubuntu runner

hardillb avatar Jan 05 '24 17:01 hardillb