Cannot find package 'tokenizers-linux-x64-musl' - Alpine support
Creating another issue for tokenizers support on alpine: error:
error: Cannot find package 'tokenizers-linux-x64-musl' from '/usr/src/app/node_modules/tokenizers/index.js'
Bun v1.1.38 (Linux x64 baseline)
/usr/src/app # ./mycli
155 | if (isMusl()) {
156 | localFileExisted = existsSync(join(__dirname, "tokenizers.linux-x64-musl.node"));
157 | try {
158 | if (localFileExisted) {
159 | nativeBinding = (()=>{throw new Error("Cannot require module "+"./tokenizers.linux-x64-musl.node");})();
160 | nativeBinding = (()=>{throw new Error("Cannot require module "+"tokenizers-linux-x64-musl");})();
^
error: Cannot require module tokenizers-linux-x64-musl
at /$bunfs/root/mycli:160:43
at /$bunfs/root/mycli:160:109
tokenizers.js:
import { Tokenizer } from "tokenizers";
const tokenizer = await Tokenizer.fromFile("tokenizer.json");
const wpEncoded = await tokenizer.encode("Who is John?");
Dockerfile:
FROM oven/bun:alpine AS base
WORKDIR /usr/src/app
FROM base AS install
RUN mkdir -p /temp/dev
COPY package.json bun.lockb /temp/dev/
RUN cd /temp/dev && bun install --frozen-lockfile
# install with --production (exclude devDependencies)
RUN mkdir -p /temp/prod
COPY package.json bun.lockb /temp/prod/
RUN cd /temp/prod && bun install --frozen-lockfile --production
# copy node_modules from temp directory
# then copy all (non-ignored) project files into the image
FROM base AS prerelease
COPY --from=install /temp/dev/node_modules node_modules
COPY . .
# copy production dependencies and source code into final image
FROM base AS release
# RUN apk add --no-cache gcompat python3 make gcc g++ glibc-2.35-r1.apk wget
RUN apk add --no-cache \
gcompat \
libc6-compat \
python3 \
make \
gcc \
g++ \
bash \
libstdc++ \
musl-dev \
wget
# RUN wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://alpine-pkgs.sgerrand.com/sgerrand.rsa.pub && \
# wget -q https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.35-r0/glibc-2.35-r0.apk && \
# apk add glibc-2.35-r0.apk && \
# rm glibc-2.35-r0.apk
COPY --from=install /temp/prod/node_modules node_modules
COPY --from=prerelease /usr/src/app/*.js .
COPY --from=prerelease /usr/src/app/package.json .
# RUN ldd /usr/src/app/node_modules/onnxruntime-node/bin/napi-v3/linux/x64/libonnxruntime.so.1.14.0
# run the app
USER bun
# EXPOSE 3000/tcp
ENTRYPOINT [ "bun", "run", "tokenizers.js" ]
What version of tokenizers are you referring to ? We haven't uploaded tokenizers.js on NPM in a loooong while (we did rewrite everything with napi, but frankly it seems the work to maintain the JS branch wasn't worth it).
Cheers.
Ah right I see, you have a point.
It does seem like I was testing with the old "tokenizers": "^0.13.3" -
https://github.com/huggingface/tokenizers/tree/main/bindings/node here.
Might have to revisit the issue as I still havent' got a working onnx/tokenizer js lib working on alpine outside of using bun to compile a cli bin. So please do let me know if there's a working embedding service working on alpine at all.
But given this issue was filed against an older version, may have no choice but to close it for now.
@PylotLight Did you ever get this working with Bun?
@PylotLight Did you ever get this working with Bun?
Haven't touched it for a while, I think I manged to at least get a bun static bin compiled, but obviously this is not ideal as compared to running uncomplied version when used as a library. So didn't solve the main issue, but think I sort of semi got a workaround.
I found @anush008/tokenizers which seems to work well