transformers.js [Question] Build step process for Vercel

Hi, I am currently in the process of trying to deploy to Vercel using Nextjs. I am using pnpm as my package manager and have put the model in the public folder. I hit this error, when building occurs, is there something necessary post install just as #295 has done?

I don't understand why this step is necessary

An error occurred while writing the file to cache: [Error: ENOENT: no such file or directory, mkdir '/var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/.cache'

Oct 18 '23 00:10 kyeshmz

I don't quite see why that would happen in the build stage: the model is cached only if (1) it doesn't exist locally and (2) when the model is accessed 👀

My guess is that the location or structure of the model folder is incorrect: can you share the code you're using as well as the folder structure?

Oct 18 '23 14:10 xenova

Sure it is used created a nextjs app ver 12. The code is as below. All the models are under public

//@ts-ignore
env.allowRemoteModels = false
env.localModelPath = process.env.NEXT_PUBLIC_URL + '/models/'
env.cacheDir = process.env.NEXT_PUBLIC_URL + '/cache/'

export default async function handler(req: NextApiRequest, res: NextApiResponse<PredictResV2>) {
  // Allocate a pipeline for sentiment-analysis
  const { classLabels, targetLabel, prompts, uploadDate, drawing } = req.body

  // const { tokenizer, processor, model } = await PipelineSingleton.getInstance()

  const modelID = 'Xenova/clip-vit-base-patch16'

  const tokenizer = await AutoTokenizer.from_pretrained(modelID)

  const processor = await AutoProcessor.from_pretrained(modelID)
  const model = await AutoModel.from_pretrained(modelID)

  const blob = await (await fetch(drawing)).blob()
  const image = await RawImage.fromBlob(blob)
  const image_inputs = await processor(image)

  const targetLabels = targetLabel.split(',')

  // Create label tokens
  // ----------------------------------------------------------------------
  // filter out target label from class labels
  const class_labels = classLabels.filter((x: string) => !targetLabels.includes(x))
  // only use the first prompt for class labels
  const class_label_prompts = class_labels.map((label: string) => `${prompts[0]} ${label}`)
  // use all prompts for target labels
  const compound_labels: Record<string, string[]> = {}
  let all_renamed_labels: string[] = []
  let target_label_prompts: string[] = []
  for (const t of targetLabels) {
    target_label_prompts = prompts.map((prompt: string) => `${prompt} ${t}`)
    const renamed_target_labels = Array.from(
      { length: target_label_prompts.length },
      (_, i) => `${t}_${i}`,
    )
    compound_labels[t] = renamed_target_labels
    all_renamed_labels = all_renamed_labels.concat(renamed_target_labels)
  }
  const labels = all_renamed_labels.concat(class_labels)

  const tokens = tokenizer(target_label_prompts.concat(class_label_prompts), {
    padding: true,
    truncation: true,
  })

  const output = await model({ ...tokens, ...image_inputs })
  const logits_per_image = output.logits_per_image.data
  // console.log('logits_per_image', logits_per_image)
  const probs = softmax(logits_per_image)
  // retunr this as list

  const predictRes: PredictResV2 = {
    probs: probs,
    labels,
    compoundLabels: compound_labels,
    targetLabel,
    classLabels,
    prompts,
    uploadDate,
  }

  res.status(200).json(predictRes)
}

Oct 19 '23 22:10 kyeshmz

So it seems that this is applicable to any serverless function framework, as I can't get this going on firebase as well even without env Would love to have some help @xenova

export const test = onRequest(async (request, response) => {
  logger.info("Hello logs!", { structuredData: true })

  const classifier = await pipeline(
    "zero-shot-image-classification",
    "Xenova/clip-vit-base-patch32",
  )
  const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg"
  const out = await classifier(url, ["tiger", "horse", "dog"])
  response.status(200).json({ message: out })
})

Oct 24 '23 01:10 kyeshmz

Hi there - do you think this could be a permission error? Also, does the directory shown in fact exist?

Oct 24 '23 08:10 xenova

@xenova As I have gotten the function to work with yarn, it seems like a problem with cloud vendors and how pnpm works. Maybe the vendors have some kind of centralized pnpm so that they can cache from similar packages? In any case this works when using yarn instead of pnpm, which I think might be the case for vercel as well.

export const test = functions
  .runWith({ memory: "1GB", timeoutSeconds: 360 })
  .https.onRequest(async (request, response) => {
    logger.info("Hello logs!", { structuredData: true })

    const classifier = await pipeline(
      "zero-shot-image-classification",
      "Xenova/clip-vit-base-patch32",
    )
    const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg"
    const out = await classifier(url, ["tiger", "horse", "dog"])
    response.status(200).json({ message: out })
  })

Oct 25 '23 00:10 kyeshmz

Vercel prevents wrtie permission except /tmp for serverless function https://github.com/orgs/vercel/discussions/1560 I achieved working api with below option

option.useFSCache = false

However the problem is that the vercel fetch onnx model for each API request (with differenct input for preventing cache). That is quite slow (about 3s for small model).

I tried to achieve local model using vercel's includeFiles (https://vercel.com/docs/projects/project-configuration#functions)

option.localModelPath = join(process.cwd(), 'models')

"includeFiles": "models/**/*",

With models folder which has submodule repos containing onnx. But this error occurs and I'm still investigating

…ages/api/recommend-pages.js:1:1904)
SyntaxError: Unexpected token v in JSON at position 0
    at JSON.parse (<anonymous>)
    at getModelJSON (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/utils/hub.js:551:17)
    at async Promise.all (index 0)
    at async loadTokenizer (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/tokenizers.js:52:16)
    at async AutoTokenizer.from_pretrained (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/tokenizers.js:3824:48)
    at async Promise.all (index 0)
    at async loadItems (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/pipelines.js:2305:5)
    at async pipeline (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/pipelines.js:2251:19)
    at async searchNotion (/var/task/.next/server/pages/api/recommend-pages.js:1:1904)
Error: Failed to load model because protobuf parsing failed.
    at new OnnxruntimeSessionHandler (/var/task/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:27:92)
    at /var/task/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:64:29
    at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
Something went wrong during model construction (most likely a missing operation). Using `wasm` as a fallback. 
Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/dist/ort-wasm-simd.wasm')
failed to asynchronously prepare wasm: RuntimeError: Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/dist/ort-wasm-simd.wasm'). Build with -sASSERTIONS for more info.
Aborted(RuntimeError: Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/dist/ort-wasm-simd.wasm'). Build with -sASSERTIONS for more info.)
fe92cef9-6dfb-5618-8c96-6a3869e036dc
SyntaxError: Unexpected token v in JSON at position 0
    at JSON.parse (<anonymous>)
    at getModelJSON (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/utils/hub.js:551:17)
    at async Promise.all (index 0)
    at async loadTokenizer (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/tokenizers.js:52:16)
    at async AutoTokenizer.from_pretrained (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/tokenizers.js:3824:48)
    at async Promise.all (index 0)
    at async loadItems (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/pipelines.js:2305:5)
    at async pipeline (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/pipelines.js:2251:19)
    at async searchNotion (/var/task/.next/server/pages/api/recommend-pages.js:1:1904)
Error: Runtime exited with error: exit status 1
Runtime.ExitError

According to this issue https://github.com/xenova/transformers.js/issues/295, protobuf error could be occured by memory issue. But my memory is 3008 pro plan and the api is working with same model from remote HF model

Nov 13 '23 15:11 seonglae

I spent many hours yesterday reading various articles and discussions to get this working, and I seem to have succeeded.

Let me try to summarize it here. At some point I'll make a sample repo. My use case is to use it in part of a larger workflow, so this is only tested with a NextJS 13.5 api route. My app will run this thousands of times a day and I don't want to be pounding on hugging face to download it over and over.

First I created my function to run the model using a singleton approach:

import { pipeline, env, Tensor, Pipeline } from '@xenova/transformers';

//env.backends.onnx.wasm.numThreads = 1;
env.allowRemoteModels = false;
env.allowLocalModels = true;
env.localModelPath = process.cwd() + "/models";

class EmbeddingSingleton {
    static task = 'feature-extraction';
    static model = 'Supabase/gte-small';
    static instance : Promise<Pipeline>;
    static getInstance () {
        this.instance ??= pipeline(this.task, this.model);
        return this.instance;
    }
}

export async function embedding_vector (content: string) : Promise<unknown[]> { // really returns array of numbers, but I'm not fighting TS right now. someone can fix it later.
    const extractor = await EmbeddingSingleton.getInstance();
    let output : Tensor = await extractor(content, {
        pooling: 'mean',
        normalize: true,
    });
    //logger.debug(`tensor = `, output);
    // Extract the embedding output
    const embedding = Array.from(output.data);
    return embedding;
}

Then I created an endpoint that calls this function and returns a JSON for me. This is standard route handling stuff so no need for a full example here. It is important to make it dynamic: export const dynamic = 'force-dynamic'; or else it will be static at build time.

In my next.config.mjs file I added the following (and installed the plugin as well):

import CopyPlugin from "copy-webpack-plugin";

// this bit goes in the webpack section:
  config.plugins.push(
      new CopyPlugin({
          patterns: [
            { // copy the transformers.js models we need into the build folder
              from: "models",
              to: "models"
            }
          ],
      })
    );

// this next bit goes in the experimental section. I'm not 100% sure what this is doing or if it is really necessary

  // Indicate that these packages should not be bundled by webpack (for transformers.js)
    serverComponentsExternalPackages: ['sharp', 'onnxruntime-node'],

Then I made my local workstation download the models into the transformers cache. I commented the env lines that force it to use a local copy and ran my test suite. Once that was done, I just moved them into the models folder:

mkdir models
mv node_modules/@xenova/transformers/.cache/Supabase models

Uncomment the env lines again and test it works locally: it shouldn't re-populate the transformers cache folder.

Commit and push to Vercel.

Profit (well, this part I'm still figuring out).

Note this will only work with a node runtime on Vercel. If you specify "edge" runtime, it will not be able to read the local files. I have yet to solve this. Also, with edge runtime, I cannot get it to run at all because it errors on wasm loading. But that's for another ticket...

Dec 06 '23 20:12 khera

@khera does this only work in the pro plan? I am also thinking about putting it in the vercel blob, if that makes it any better.

Dec 08 '23 03:12 kyeshmz

@kyeshmz vercel server blob's size limit too small (4.5MB) for model

Dec 08 '23 05:12 seonglae

@khera does this only work in the pro plan? I am also thinking about putting it in the vercel blob, if that makes it any better.

I don't know, I have a pro plan. The model I'm using is only 32MB.

Dec 08 '23 15:12 khera

@kyeshmz vercel server blob's size limit too small (4.5MB) for model

https://vercel.com/docs/functions/serverless-functions/runtimes#size-limits says it is 250MB. The 4.5MB is compressed response output size.

Dec 08 '23 15:12 khera

@khera I think he meant Vercel Blob storage

Dec 08 '23 15:12 seonglae

Ah yeah vercel blob storage, but I guess I will put it up on cloudflare r2 anyways

Dec 09 '23 01:12 kyeshmz

Can anyone verify that Vercel is caching the model when I don't include it in my package? When I look at the Vercel usage page, it shows nothing in the data cache, which is where one would expect anything retrieved via fetch() to show up in my Next.js app. I'm dynamically fetching the default sentiment analysis model, which should be about 60KB.

It is difficult to tell from the startup time if I'm just hitting the slow-start or if it is taking time to fetch, or both.

Dec 26 '23 19:12 vickkhera

Can anyone verify that Vercel is caching the model when I don't include it in my package?

I'd say yes, given that the Vercel Data Cache limit is 2MB. You can now always check on Vercel logs — for every request it will have the "Request Metrics" section now where you'll what's being cached and what not.

Mar 16 '24 16:03 kvnang

Anyone struggling with this issue, you need to cache your models in the tmp folder of the function env.cacheDir = /tmp/.cache so that the cache is stored in the memory rather than the data cache. Vercel runs on lambda and lambda functions have a tmp folder that allows you to store external files and are available to access at request.

Mar 21 '24 07:03 arnabtarwani

Thanks so much @arnabtarwani , this fixed my issue with Vercel and explains why. Do you know if this /tmp/ dir is available throughout the lifetime of a Vercel app? Or does it change often and we would need to re-cache the model?

Apr 06 '24 04:04 brandonmburroughs

@brandonmburroughs

The /tmp area is preserved for the duration of the execution environment and serves as a transient cache for data between invocations. Each time a new execution environment is created, this area is cleared.

https://aws.amazon.com/ko/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps/

They mention that the lifetime of the execution environment is managed by AWS and is not a fixed value. However, some analyses have shown that the average lifetime of an AWS Lambda instance is about 130 minutes, depending on memory sizes.

AWS

https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html#runtimes-lifecycle
https://repost.aws/questions/QUKdeptBaRT5OKa-5ZCY3Bpg/how-long-does-a-lambda-instance-can-keep-warm

Analysis

https://www.pluralsight.com/resources/blog/cloud/how-long-does-aws-lambda-keep-your-idle-functions-around-before-a-cold-start
https://xebia.com/blog/til-that-aws-lambda-terminates-instances-preemptively/

Apr 06 '24 06:04 seonglae

transformers.js transformers.js copied to clipboard

[Question] Build step process for Vercel

transformers.js
transformers.js copied to clipboard