Unable to construct viable Docker image using `node:20-alpine`
Our build uses an ubuntu-latest Github runner to build a Docker image.
Our Dockerfile follows the example provided in this repo.
FROM node:20-alpine
COPY ./dist /app/
WORKDIR /app
RUN apk --no-cache add \
bash \
g++ \
ca-certificates \
lz4-dev \
musl-dev \
cyrus-sasl-dev \
openssl-dev \
make \
python3 \
gcompat # added to provide missing ld-linux-x86-64.so.2
RUN apk add --no-cache --virtual .build-deps gcc zlib-dev libc-dev bsd-compat-headers py-setuptools bash
RUN npm install --omit=dev
EXPOSE 4000
CMD [ "node", "app.js" ]
The deployed pod is hosted in AKS, and both the runners and host nodes are amd64 arch.
Without the @confluentinc/kafka-javascript dependency in the package.json, the application will start without issue on the container.
With the @confluentinc/kafka-javascript dependency in the package.json (and no reference from the application), the application will immediately fail with:
Segmentation fault (core dumped)
While troubleshooting, we discovered that if we reinstalled the package on the running container, the application would then startup normally.
Initial thought was that the wrong flavor of librdkafka was being download.
By adding the following to the Dockerfile, I was able to capture the node-pre-gyp output:
WORKDIR /app/node_modules/@confluentinc/kafka-javascript
RUN npx node-pre-gyp install --update-binary
WORKDIR /app
#12 0.816 node-pre-gyp info using [email protected]
#12 0.816 node-pre-gyp info using [email protected] | linux | x64
#12 0.906 node-pre-gyp http GET https://github.com/confluentinc/confluent-kafka-javascript/releases/download/v0.1.15-devel/confluent-kafka-javascript-v0.1.15-devel-node-v115-linux-musl-x64.tar.gz
Again, launching this container results in the segmentation fault on startup.
Starting the container, and running the following:
cd node_modules/\@confluentinc/kafka-javascript/
npx node-pre-gyp install --update-binary
cd /app
...seemingly performs the same operation we saw during the Docker image construction:
node-pre-gyp info using [email protected]
node-pre-gyp info using [email protected] | linux | x64
http GET https://github.com/confluentinc/confluent-kafka-javascript/releases/download/v0.1.15-devel/confluent-kafka-javascript-v0.1.15-devel-node-v115-linux-musl-x64.tar.gz
...yet after this operation is performed, the application starts without issue.
Please help us to understand what is going on here, and how we can solve this problem.
When I do a diff of the node_modules/@confluentinc/kafka-javascript/build/Release/ directories before/after running the node-pre-gyp on the started container, the noticeable difference is many instances of:
/home/semaphore/.cachereplaced with/root/.cache/home/semaphore/confluent-kafka-javascriptreplaced with/v
If I download directly from confluent-kafka-javascript-v0.1.15-devel-node-v115-linux-musl-x64.tar.gz I see the references are all /root/.cache and /v; so it's unclear to me how the docker image is ending up with an apparently different version despite resolving to the same download URL.
Hey - I repro'd this issue, but I'm not sure of the cause yet. The confluent-kafka-javascript.node is different at the start and at the end after running npx node-pre-gyp install --update-binary (I checked with the md5sum).
Here's my process:
- npm init an app in the 'dist' folder and install @confluent/kafka-javascript. MD5 sum of confluent-kafka-javascript.node = X
- Build and run docker file. Here too MD5 sum of confluent-kafka-javascript.node = X
- Run the node-pre-gyp command. Now the MD5 sum of confluent-kafka-javascript.node = Y, and the linkings have also changed (after running
ldd).
Suggested workaround for now:
COPY ./dist /app/
WORKDIR /app
+ RUN rm -rf node_modules
(you can also just delete node_modules/@confluentinc if you want to be more specific).
As far as I can understand, the npm install within the Dockerfile isn't re-pulling the right platform/libc combo of confluent-kafka-javascript.node, and just goes on with whatever is there within the node_modules unless it's empty.
Also, since there is the pre-compiled binary now, the dockerfile can be trimmed to a great extent, something like:
FROM node:20-alpine
COPY ./dist /app/
WORKDIR /app
RUN rm -rf node_modules/\@confluentinc
RUN npm install --omit=dev
EXPOSE 4000
CMD [ "node", "app.js" ]
I will update the example.
I have a fix in mind, changing the npm install script to node-pre-gyp install --fallback-to-build --update-binary rather than node-pre-gyp install --fallback-to-build, however, that will incur the download of a remote binary more than required, so I'm not making that change immediately.
I'll discuss that, and other possible solutions with my team, and provide a fix.
Closing this out - using node-pre-gyp install --fallback-to-build --update-binary has a big enough impact as it affects every installation, and at the moment, we're not considering just changing this particular step because of that. Please reopen if the issue persists with the Dockerfile changes.