transformers.js
transformers.js copied to clipboard
[Question] Build step process for Vercel
Hi, I am currently in the process of trying to deploy to Vercel using Nextjs. I am using pnpm as my package manager and have put the model in the public folder. I hit this error, when building occurs, is there something necessary post install just as #295 has done?
I don't understand why this step is necessary
An error occurred while writing the file to cache: [Error: ENOENT: no such file or directory, mkdir '/var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/.cache'
I don't quite see why that would happen in the build stage: the model is cached only if (1) it doesn't exist locally and (2) when the model is accessed 👀
My guess is that the location or structure of the model folder is incorrect: can you share the code you're using as well as the folder structure?
Sure it is used created a nextjs app ver 12. The code is as below. All the models are under public
//@ts-ignore
env.allowRemoteModels = false
env.localModelPath = process.env.NEXT_PUBLIC_URL + '/models/'
env.cacheDir = process.env.NEXT_PUBLIC_URL + '/cache/'
export default async function handler(req: NextApiRequest, res: NextApiResponse<PredictResV2>) {
// Allocate a pipeline for sentiment-analysis
const { classLabels, targetLabel, prompts, uploadDate, drawing } = req.body
// const { tokenizer, processor, model } = await PipelineSingleton.getInstance()
const modelID = 'Xenova/clip-vit-base-patch16'
const tokenizer = await AutoTokenizer.from_pretrained(modelID)
const processor = await AutoProcessor.from_pretrained(modelID)
const model = await AutoModel.from_pretrained(modelID)
const blob = await (await fetch(drawing)).blob()
const image = await RawImage.fromBlob(blob)
const image_inputs = await processor(image)
const targetLabels = targetLabel.split(',')
// Create label tokens
// ----------------------------------------------------------------------
// filter out target label from class labels
const class_labels = classLabels.filter((x: string) => !targetLabels.includes(x))
// only use the first prompt for class labels
const class_label_prompts = class_labels.map((label: string) => `${prompts[0]} ${label}`)
// use all prompts for target labels
const compound_labels: Record<string, string[]> = {}
let all_renamed_labels: string[] = []
let target_label_prompts: string[] = []
for (const t of targetLabels) {
target_label_prompts = prompts.map((prompt: string) => `${prompt} ${t}`)
const renamed_target_labels = Array.from(
{ length: target_label_prompts.length },
(_, i) => `${t}_${i}`,
)
compound_labels[t] = renamed_target_labels
all_renamed_labels = all_renamed_labels.concat(renamed_target_labels)
}
const labels = all_renamed_labels.concat(class_labels)
const tokens = tokenizer(target_label_prompts.concat(class_label_prompts), {
padding: true,
truncation: true,
})
const output = await model({ ...tokens, ...image_inputs })
const logits_per_image = output.logits_per_image.data
// console.log('logits_per_image', logits_per_image)
const probs = softmax(logits_per_image)
// retunr this as list
const predictRes: PredictResV2 = {
probs: probs,
labels,
compoundLabels: compound_labels,
targetLabel,
classLabels,
prompts,
uploadDate,
}
res.status(200).json(predictRes)
}
So it seems that this is applicable to any serverless function framework, as I can't get this going on firebase as well even without env Would love to have some help @xenova
export const test = onRequest(async (request, response) => {
logger.info("Hello logs!", { structuredData: true })
const classifier = await pipeline(
"zero-shot-image-classification",
"Xenova/clip-vit-base-patch32",
)
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg"
const out = await classifier(url, ["tiger", "horse", "dog"])
response.status(200).json({ message: out })
})
Hi there - do you think this could be a permission error? Also, does the directory shown in fact exist?
@xenova As I have gotten the function to work with yarn, it seems like a problem with cloud vendors and how pnpm works. Maybe the vendors have some kind of centralized pnpm so that they can cache from similar packages? In any case this works when using yarn instead of pnpm, which I think might be the case for vercel as well.
export const test = functions
.runWith({ memory: "1GB", timeoutSeconds: 360 })
.https.onRequest(async (request, response) => {
logger.info("Hello logs!", { structuredData: true })
const classifier = await pipeline(
"zero-shot-image-classification",
"Xenova/clip-vit-base-patch32",
)
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg"
const out = await classifier(url, ["tiger", "horse", "dog"])
response.status(200).json({ message: out })
})
Vercel prevents wrtie permission except /tmp
for serverless function https://github.com/orgs/vercel/discussions/1560
I achieved working api with below option
option.useFSCache = false
However the problem is that the vercel fetch onnx model for each API request (with differenct input for preventing cache). That is quite slow (about 3s for small model).
I tried to achieve local model using vercel's includeFiles
(https://vercel.com/docs/projects/project-configuration#functions)
option.localModelPath = join(process.cwd(), 'models')
"includeFiles": "models/**/*",
With models
folder which has submodule repos containing onnx
. But this error occurs and I'm still investigating
…ages/api/recommend-pages.js:1:1904)
SyntaxError: Unexpected token v in JSON at position 0
at JSON.parse (<anonymous>)
at getModelJSON (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/utils/hub.js:551:17)
at async Promise.all (index 0)
at async loadTokenizer (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/tokenizers.js:52:16)
at async AutoTokenizer.from_pretrained (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/tokenizers.js:3824:48)
at async Promise.all (index 0)
at async loadItems (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/pipelines.js:2305:5)
at async pipeline (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/pipelines.js:2251:19)
at async searchNotion (/var/task/.next/server/pages/api/recommend-pages.js:1:1904)
Error: Failed to load model because protobuf parsing failed.
at new OnnxruntimeSessionHandler (/var/task/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:27:92)
at /var/task/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:64:29
at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
Something went wrong during model construction (most likely a missing operation). Using `wasm` as a fallback.
Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/dist/ort-wasm-simd.wasm')
failed to asynchronously prepare wasm: RuntimeError: Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/dist/ort-wasm-simd.wasm'). Build with -sASSERTIONS for more info.
Aborted(RuntimeError: Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/dist/ort-wasm-simd.wasm'). Build with -sASSERTIONS for more info.)
fe92cef9-6dfb-5618-8c96-6a3869e036dc
SyntaxError: Unexpected token v in JSON at position 0
at JSON.parse (<anonymous>)
at getModelJSON (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/utils/hub.js:551:17)
at async Promise.all (index 0)
at async loadTokenizer (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/tokenizers.js:52:16)
at async AutoTokenizer.from_pretrained (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/tokenizers.js:3824:48)
at async Promise.all (index 0)
at async loadItems (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/pipelines.js:2305:5)
at async pipeline (file:///var/task/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/pipelines.js:2251:19)
at async searchNotion (/var/task/.next/server/pages/api/recommend-pages.js:1:1904)
Error: Runtime exited with error: exit status 1
Runtime.ExitError
According to this issue https://github.com/xenova/transformers.js/issues/295, protobuf error could be occured by memory issue. But my memory is 3008 pro plan and the api is working with same model from remote HF model
I spent many hours yesterday reading various articles and discussions to get this working, and I seem to have succeeded.
Let me try to summarize it here. At some point I'll make a sample repo. My use case is to use it in part of a larger workflow, so this is only tested with a NextJS 13.5 api route. My app will run this thousands of times a day and I don't want to be pounding on hugging face to download it over and over.
First I created my function to run the model using a singleton approach:
import { pipeline, env, Tensor, Pipeline } from '@xenova/transformers';
//env.backends.onnx.wasm.numThreads = 1;
env.allowRemoteModels = false;
env.allowLocalModels = true;
env.localModelPath = process.cwd() + "/models";
class EmbeddingSingleton {
static task = 'feature-extraction';
static model = 'Supabase/gte-small';
static instance : Promise<Pipeline>;
static getInstance () {
this.instance ??= pipeline(this.task, this.model);
return this.instance;
}
}
export async function embedding_vector (content: string) : Promise<unknown[]> { // really returns array of numbers, but I'm not fighting TS right now. someone can fix it later.
const extractor = await EmbeddingSingleton.getInstance();
let output : Tensor = await extractor(content, {
pooling: 'mean',
normalize: true,
});
//logger.debug(`tensor = `, output);
// Extract the embedding output
const embedding = Array.from(output.data);
return embedding;
}
Then I created an endpoint that calls this function and returns a JSON for me. This is standard route handling stuff so no need for a full example here. It is important to make it dynamic: export const dynamic = 'force-dynamic';
or else it will be static at build time.
In my next.config.mjs
file I added the following (and installed the plugin as well):
import CopyPlugin from "copy-webpack-plugin";
// this bit goes in the webpack section:
config.plugins.push(
new CopyPlugin({
patterns: [
{ // copy the transformers.js models we need into the build folder
from: "models",
to: "models"
}
],
})
);
// this next bit goes in the experimental section. I'm not 100% sure what this is doing or if it is really necessary
// Indicate that these packages should not be bundled by webpack (for transformers.js)
serverComponentsExternalPackages: ['sharp', 'onnxruntime-node'],
Then I made my local workstation download the models into the transformers cache. I commented the env
lines that force it to use a local copy and ran my test suite. Once that was done, I just moved them into the models
folder:
mkdir models
mv node_modules/@xenova/transformers/.cache/Supabase models
Uncomment the env lines again and test it works locally: it shouldn't re-populate the transformers cache folder.
Commit and push to Vercel.
Profit (well, this part I'm still figuring out).
Note this will only work with a node runtime on Vercel. If you specify "edge" runtime, it will not be able to read the local files. I have yet to solve this. Also, with edge runtime, I cannot get it to run at all because it errors on wasm loading. But that's for another ticket...
@khera does this only work in the pro plan? I am also thinking about putting it in the vercel blob, if that makes it any better.
@kyeshmz vercel server blob's size limit too small (4.5MB) for model
@khera does this only work in the pro plan? I am also thinking about putting it in the vercel blob, if that makes it any better.
I don't know, I have a pro plan. The model I'm using is only 32MB.
@kyeshmz vercel server blob's size limit too small (4.5MB) for model
https://vercel.com/docs/functions/serverless-functions/runtimes#size-limits says it is 250MB. The 4.5MB is compressed response output size.
@khera I think he meant Vercel Blob storage
Ah yeah vercel blob storage, but I guess I will put it up on cloudflare r2 anyways
Can anyone verify that Vercel is caching the model when I don't include it in my package? When I look at the Vercel usage page, it shows nothing in the data cache, which is where one would expect anything retrieved via fetch()
to show up in my Next.js app. I'm dynamically fetching the default sentiment analysis model, which should be about 60KB.
It is difficult to tell from the startup time if I'm just hitting the slow-start or if it is taking time to fetch, or both.
Can anyone verify that Vercel is caching the model when I don't include it in my package?
I'd say yes, given that the Vercel Data Cache limit is 2MB. You can now always check on Vercel logs — for every request it will have the "Request Metrics" section now where you'll what's being cached and what not.
Anyone struggling with this issue, you need to cache your models in the tmp folder of the function env.cacheDir = /tmp/.cache
so that the cache is stored in the memory rather than the data cache. Vercel runs on lambda and lambda functions have a tmp folder that allows you to store external files and are available to access at request.
Thanks so much @arnabtarwani , this fixed my issue with Vercel and explains why. Do you know if this /tmp/
dir is available throughout the lifetime of a Vercel app? Or does it change often and we would need to re-cache the model?
@brandonmburroughs
The /tmp area is preserved for the duration of the execution environment and serves as a transient cache for data between invocations. Each time a new execution environment is created, this area is cleared.
- https://aws.amazon.com/ko/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps/
They mention that the lifetime of the execution environment is managed by AWS and is not a fixed value. However, some analyses have shown that the average lifetime of an AWS Lambda instance is about 130 minutes, depending on memory sizes.
AWS
- https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html#runtimes-lifecycle
- https://repost.aws/questions/QUKdeptBaRT5OKa-5ZCY3Bpg/how-long-does-a-lambda-instance-can-keep-warm
Analysis
- https://www.pluralsight.com/resources/blog/cloud/how-long-does-aws-lambda-keep-your-idle-functions-around-before-a-cold-start
- https://xebia.com/blog/til-that-aws-lambda-terminates-instances-preemptively/