docker-node
docker-node copied to clipboard
npm hangs on linux/s390x containers
Environment
- Platform: linux/s390x
- Docker Version: 24.0.6, build ed223bc
- Node.js Version: 18
- Image Tag:18-alpine
Expected Behavior
npm install
runs and packages are installed.
Current Behavior
Trying to build a container on the linux/s309x platform hangs running npm install
with npm consuming 100% CPU.
Previous builds complete in less than 5mins, current build has been running for over an hour
We are building the https://github.com/node-red/node-red-docker container with
docker buildx build --platform linux/s390x --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 .
Possible Solution
Steps to Reproduce
- Check out https://github.com/node-red/node-red-docker
-
cd node-red-docker
- run
docker buildx build --platform linux/s390x --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 .
Additional Information
Same thing is happening with 14-alpine and 16-alpine tags
I'm hitting this both locally and in a GH Action, both of which use Qemu to support building for alternate architectures.
I have similar issue (see Dockerfile).
I wonder whether the problem of #1798 and #1829 finally snuck into 18 and earlier images.
Interesting. I've just fired up the docker image (node:16-alpine and node:18-alpine) on a real s390x system and npm seems to install without any problems. Which would lead us to perhaps something specific to qemu or the docker version in use (Mine is Docker version 24.0.5, build 24.0.5-0ubuntu1~22.04.1
)
Just tried with your dockerfile - went through without problems:
build18.log.gz
Command: docker build --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 . 2>&1 | tee build18.log
Which does appear to point to this possibly being a qemu based problem. I know my laptop got a recent set of qemu packages, but not sure what would be needed to debug this. Any pointers would be helpful
@hardillb setup-qemu-action
uses onistiigi/binfmt
Docker image for installing QEMU binaries. I think other versions like 6.1.0
or master
could be tried to "resolve" this at least on GitHub Actions.
master
doesn't appear to fix it for me, testing 6.1.0
no joy with qemu-v6.1.0
either so this may be a NodeJS + Qemu issue
OK, while this appears to be limited to when running builds using qemu, this is going to be the default way 99% of CI builds run that target s390x, so I think we still need to track this down, even if it's just to raise a sensible upstream issue against qemu.
- What debug options can I enable to try and get some useful debug information here?
- Would trying to connect GDB to the spinning process help?
@hardillb seems like after moby/buildkit#1516 we may omit using setup-qemu-action
, because BuildKit supports QEMU emulation out-of-the-box. Even more, judging by onistiigi/binfmt
Docker image tags, newer version of QEMU are released for buildkit-
images only. The last one is 7.1.0.
However, for my repository the result is still the same, no matter which version is used: 6.0.0, 6.1.0, 6.2.0, 7.0.0, 7.1.0 or master
.
@hardillb in my case, the problem seems to be related to Linux only, somehow. I was able to resolve the issue just by switching to macos-latest
runner for archs where the build stucks.
I will try this workaround for #1798 too, and will report the results.
@tyranron did you get any joy using the docker.io/
prefix on the base containers?
If it is the qemu but https://gitlab.com/qemu-project/qemu/-/issues/1729 then hopefully it gets fixed soon.
@hardillb
did you get any joy using the
docker.io/
prefix on the base containers?
These are the same images, no?
I will try this workaround for https://github.com/nodejs/docker-node/issues/1798 too, and will report the results.
Building under macos-latest
runner didn't work out for Node.js 20, but for 18 it fixed my problem.
This may not be the same as the other qemu bug as it's not calling mremap
.
I ran the following command:
docker run --platform linux/s390x -it --cap-add=SYS_PTRACE -e QEMU_STRACE=true -e QEMU_LOG_FILENAME=qemu.log -v ./qemu.log:/qemu.log --rm node:18-alpine npm install node-red:3.1.0
and got the following strace:
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62cd8) = 0 ({tv_sec = 2223689,tv_nsec = 708242416})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62db8) = 0 ({tv_sec = 2223689,tv_nsec = 708269945})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62db8) = 0 ({tv_sec = 2223689,tv_nsec = 708530498})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62eb0) = 0 ({tv_sec = 2223689,tv_nsec = 708556031})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62eb0) = 0 ({tv_sec = 2223689,tv_nsec = 708593060})
1 munmap(0x00000040101ef000,57344) = 0
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b63028) = 0 ({tv_sec = 2223689,tv_nsec = 708653152})
1 socket(PF_NETLINK,SOCK_RAW|SOCK_CLOEXEC,NETLINK_ROUTE) = 24
1 sendto(24,275007277688,20,0,0,0) = 20
1 recvfrom(24,275007277688,8192,64,0,0) = 2880
This looks to be spinning trying to receive data from the network. How do we move this forward?
Due to https://github.com/tonistiigi/binfmt/pull/120 we have QEMU 8.0 in onistiigi/binfmt:master
Docker image now. Tried it with node:21
Docker image, and still no luck.
I started seeing this issue on September 19th, 2023.
I created a repo to help diagnose the problem, or to detect when a fix is made upstream. It runs daily tests on two versions of node across six architectures on Debian and Alpine. It simply attempts npm -v
.
On Nov 7: 4 of the 12 Alpine combinations are failing.
Daily test status:
See:
- https://github.com/felddy/npm-hang-test
- https://github.com/felddy/npm-hang-test/issues/2
report the similar in ticket #1946
I've been playing with this again (as it's still a problem). I've been using AWS EC2 machines to try out a few different options.
- It fails on Both Intel and AMD based x86_64 hardware
- It fails on AWS Arm64 hardware as well
- It fails with the latest 8.0.6 qemu builds (as provided by the qemu-v8.0.4 tag of tonistiigi/binfmt
- I've tried Ubuntu 22.04 and 23.10 base OS builds
I tried to run it on ubuntu-20.04 s390x and it works fine, but arm/v6 and arm/v7 still don't work, only alpine3.18 and nodejs18. https://github.com/whyour/qinglong/actions/runs/7375782137/job/20067750407
I tried to reproduce the problem on my macbook and it seems to be working for me: FYI this is what I get:
mac-jan:tmp jan$ git clone https://github.com/node-red/node-red-docker
Cloning into 'node-red-docker'...
remote: Enumerating objects: 3154, done.
remote: Counting objects: 100% (225/225), done.
remote: Compressing objects: 100% (107/107), done.
remote: Total 3154 (delta 133), reused 197 (delta 118), pack-reused 2929
Receiving objects: 100% (3154/3154), 823.97 KiB | 1.99 MiB/s, done.
Resolving deltas: 100% (1988/1988), done.
mac-jan:tmp jan$ ls
node-red-docker
mac-jan:tmp jan$ cd node-red-docker/
mac-jan:node-red-docker jan$ docker buildx build --platform linux/s390x --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 .
[+] Building 331.6s (20/20) FINISHED docker-container:build
=> [internal] load build definition from Dockerfile.alpine 0.1s
=> => transferring dockerfile: 3.55kB 0.0s
=> [internal] load metadata for docker.io/library/node:18-alpine 4.0s
=> [auth] library/node:pull token for registry-1.docker.io 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [base 1/11] FROM docker.io/library/node:18-alpine@sha256:b1a0356f7d6b86c958a06949d3db3f7fb27f95f627aa6157cb98bc65c801efa2 18.3s
=> => resolve docker.io/library/node:18-alpine@sha256:b1a0356f7d6b86c958a06949d3db3f7fb27f95f627aa6157cb98bc65c801efa2 0.0s
=> => sha256:8f566b0cf37515471460f9658e9c86f83bf2350169945c1f2b328eec90ccac61 449B / 449B 2.0s
=> => sha256:743d88e4fdd9423c2822f8530204bc76ce14cdcfc62b97fe81bb5a6115485080 2.34MB / 2.34MB 18.0s
=> => sha256:9d91f34cd4b1eccf088eefcf662f235b2c0ae325b9739b7f7f7e875c25ba8643 41.11MB / 41.11MB 11.1s
=> => sha256:0fca3ee44ced87b7184bc23390283fdf10cfae0e844a25b785dd11c463815227 3.24MB / 3.24MB 3.3s
=> => extracting sha256:0fca3ee44ced87b7184bc23390283fdf10cfae0e844a25b785dd11c463815227 0.2s
=> => extracting sha256:9d91f34cd4b1eccf088eefcf662f235b2c0ae325b9739b7f7f7e875c25ba8643 2.3s
=> => extracting sha256:743d88e4fdd9423c2822f8530204bc76ce14cdcfc62b97fe81bb5a6115485080 0.1s
=> => extracting sha256:8f566b0cf37515471460f9658e9c86f83bf2350169945c1f2b328eec90ccac61 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 7.81kB 0.0s
=> [base 2/11] COPY .docker/scripts/*.sh /tmp/ 0.0s
=> [base 3/11] COPY .docker/healthcheck.js / 0.0s
=> [base 4/11] RUN set -ex && apk add --no-cache bash tzdata iputils curl nano git openssl open 8.1s
=> [base 5/11] WORKDIR /usr/src/node-red 0.0s
=> [base 6/11] COPY .docker/known_hosts.sh . 0.0s
=> [base 7/11] RUN ./known_hosts.sh /etc/ssh/ssh_known_hosts && rm /usr/src/node-red/known_hosts.sh 71.6s
=> [base 8/11] RUN echo "PubkeyAcceptedKeyTypes +ssh-rsa" >> /etc/ssh/ssh_config 0.2s
=> [base 9/11] COPY package.json . 0.0s
=> [base 10/11] COPY flows.json /data 0.1s
=> [base 11/11] COPY .docker/scripts/entrypoint.sh . 0.1s
=> [build 1/1] RUN apk add --no-cache --virtual buildtools build-base linux-headers udev python3 && npm install --unsafe-perm --no-update-notifier --no-audit 178.4s
=> [release 1/3] COPY --from=build /usr/src/node-red/prod_node_modules ./node_modules 0.8s
=> [release 2/3] RUN chown -R node-red:root /usr/src/node-red && /tmp/install_devtools.sh && rm -r /tmp/* 41.5s
=> [release 3/3] RUN npm config set cache /data/.npm --global 7.3s
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
mac-jan:node-red-docker jan$
FYI My macbook docker setup:
1/ I have installed lima (so I don't use docker desktop)
# install lima
brew install lima
# create default lima instance with 6GB memory using docker template
limactl start --name=default --set='.cpus = 4 | .memory = "6GiB" | .disk = "100GiB" ' template://docker
# create docker context - note that the actual unix socket path is returned by the previous command.
docker context create colima --docker "host=unix:///Users/jan/.lima/default/sock/docker.sock"
colima"
# starts the docker environment on my macbook.
limactl start
2/ I have installed Docker Buildx as follows:
# in folder /Users/jan/.docker/cli-plugins
wget https://github.com/docker/buildx/releases/download/v0.10.3/buildx-v0.10.3.darwin-amd64
mv buildx-v0.10.3.darwin-amd64 docker-buildx
chmod a+x docker-buildx
Add binfmt_misc support for additional platforms as specified in https://docs.docker.com/build/building/multi-platform/
docker run --privileged --rm tonistiigi/binfmt --install all
With https://github.com/tonistiigi/binfmt/pull/144 (QEMU 8.1.4) and node:21
it still doesn't work for me on arm32v6
, arm32v7
and s390x
platforms. Tried building on both macos-latest
and ubuntu-latest
runners:
- https://github.com/instrumentisto/haraka-docker-image/commit/8058decb6aa614e214ff336d3360c8c3b493c5b3
- https://github.com/instrumentisto/haraka-docker-image/commit/c3ef1bd79c4d94223bd7f8862be15060540fc417
Also the important place to test this is in AMD64 hardware as this needs to run on GH actions with the Ubuntu runner