tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

Cannot find package 'tokenizers-linux-x64-musl' - Alpine support

Open PylotLight opened this issue 1 year ago • 5 comments

Creating another issue for tokenizers support on alpine: error:

error: Cannot find package 'tokenizers-linux-x64-musl' from '/usr/src/app/node_modules/tokenizers/index.js'
Bun v1.1.38 (Linux x64 baseline)

/usr/src/app # ./mycli 
155 |         if (isMusl()) {
156 |           localFileExisted = existsSync(join(__dirname, "tokenizers.linux-x64-musl.node"));
157 |           try {
158 |             if (localFileExisted) {
159 |               nativeBinding = (()=>{throw new Error("Cannot require module "+"./tokenizers.linux-x64-musl.node");})();
160 |               nativeBinding = (()=>{throw new Error("Cannot require module "+"tokenizers-linux-x64-musl");})();
                                                ^
error: Cannot require module tokenizers-linux-x64-musl
      at /$bunfs/root/mycli:160:43
      at /$bunfs/root/mycli:160:109

tokenizers.js:

import { Tokenizer } from "tokenizers";
const tokenizer = await Tokenizer.fromFile("tokenizer.json");
const wpEncoded = await tokenizer.encode("Who is John?");

Dockerfile:

FROM oven/bun:alpine AS base
WORKDIR /usr/src/app


FROM base AS install
RUN mkdir -p /temp/dev
COPY package.json bun.lockb /temp/dev/
RUN cd /temp/dev && bun install --frozen-lockfile

# install with --production (exclude devDependencies)
RUN mkdir -p /temp/prod
COPY package.json bun.lockb /temp/prod/
RUN cd /temp/prod && bun install --frozen-lockfile --production


# copy node_modules from temp directory
# then copy all (non-ignored) project files into the image
FROM base AS prerelease
COPY --from=install /temp/dev/node_modules node_modules
COPY . .

# copy production dependencies and source code into final image
FROM base AS release
# RUN apk add --no-cache gcompat python3 make gcc g++ glibc-2.35-r1.apk wget
RUN apk add --no-cache \
    gcompat \
    libc6-compat \
    python3 \
    make \
    gcc \
    g++ \
    bash \
    libstdc++ \
    musl-dev \
    wget
    
# RUN wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://alpine-pkgs.sgerrand.com/sgerrand.rsa.pub && \
#     wget -q https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.35-r0/glibc-2.35-r0.apk && \
#     apk add glibc-2.35-r0.apk && \
#     rm glibc-2.35-r0.apk

COPY --from=install /temp/prod/node_modules node_modules
COPY --from=prerelease /usr/src/app/*.js .
COPY --from=prerelease /usr/src/app/package.json .
# RUN ldd /usr/src/app/node_modules/onnxruntime-node/bin/napi-v3/linux/x64/libonnxruntime.so.1.14.0
# run the app
USER bun
# EXPOSE 3000/tcp
ENTRYPOINT [ "bun", "run", "tokenizers.js" ]

PylotLight avatar Dec 14 '24 01:12 PylotLight

What version of tokenizers are you referring to ? We haven't uploaded tokenizers.js on NPM in a loooong while (we did rewrite everything with napi, but frankly it seems the work to maintain the JS branch wasn't worth it).

Cheers.

Narsil avatar Jan 09 '25 12:01 Narsil

Ah right I see, you have a point. It does seem like I was testing with the old "tokenizers": "^0.13.3" - https://github.com/huggingface/tokenizers/tree/main/bindings/node here.

Might have to revisit the issue as I still havent' got a working onnx/tokenizer js lib working on alpine outside of using bun to compile a cli bin. So please do let me know if there's a working embedding service working on alpine at all.

But given this issue was filed against an older version, may have no choice but to close it for now.

PylotLight avatar Jan 10 '25 06:01 PylotLight

@PylotLight Did you ever get this working with Bun?

francescov1 avatar Aug 27 '25 21:08 francescov1

@PylotLight Did you ever get this working with Bun?

Haven't touched it for a while, I think I manged to at least get a bun static bin compiled, but obviously this is not ideal as compared to running uncomplied version when used as a library. So didn't solve the main issue, but think I sort of semi got a workaround.

PylotLight avatar Aug 27 '25 23:08 PylotLight

I found @anush008/tokenizers which seems to work well

francescov1 avatar Aug 27 '25 23:08 francescov1