transformers.js jinaai/jina-clip-v1: support for model names with prefixes

Model description

Prerequisites

[X] The model is supported in Transformers (i.e., listed here)
[X] The model can be exported to ONNX with Optimum (i.e., listed here)

Additional information

You just added the onnx files to their HF repo, that's great! 🥳

Now that model files are getting more complex and have a prefix like text_ or vision_ (or even audio_ in the future), transformers.js needs an update as it doesn't support loading files other than model.onnx or model_quantized.onnx if see it correctly. You'll get this kind of error atm with 17.2 as it cannot locate the files with above prefixes:

Uncaught (in promise) Error: Could not locate file: "https://huggingface.co/jinaai/jina-clip-v1/resolve/main/onnx/model_quantized.onnx".
    at handleError (webpack://semanticfinder/./node_modules/@xenova/transformers/src/utils/hub.js?:248:11)
    at getModelFile (webpack://semanticfinder/./node_modules/@xenova/transformers/src/utils/hub.js?:481:24)
    at async constructSession (webpack://semanticfinder/./node_modules/@xenova/transformers/src/models.js?:451:18)
    at async Promise.all (index 1)
    at async PreTrainedModel.from_pretrained (webpack://semanticfinder/./node_modules/@xenova/transformers/src/models.js?:1121:20)
    at async AutoModel.from_pretrained (webpack://semanticfinder/./node_modules/@xenova/transformers/src/models.js?:5852:20)
    at async Promise.all (index 1)
    at async loadItems (webpack://semanticfinder/./node_modules/@xenova/transformers/src/pipelines.js?:3269:5)
    at async pipeline (webpack://semanticfinder/./node_modules/@xenova/transformers/src/pipelines.js?:3209:21)
    at async self.onmessage (webpack://semanticfinder/./src/js/worker.js?:420:24)

You're probably already working on this, but I still though it might be useful to have it documented here for anyone else looking for support.

Or is there already another way to specify the name?

Your contribution

I can gladly test!

Jun 05 '24 09:06 do-me

You can specify model_file_name as one of the options in .from_pretrained(model_id, { model_file_name: 'model' } :) Although, do note that the weights I uploaded only work for Transformers.js v3 (unless you manually override the onnxruntime-web/node version to >= 1.16.0).

See the README for example Transformers.js code:

import { AutoTokenizer, CLIPTextModelWithProjection, AutoProcessor, CLIPVisionModelWithProjection, RawImage, cos_sim } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('jinaai/jina-clip-v1');
const text_model = await CLIPTextModelWithProjection.from_pretrained('jinaai/jina-clip-v1');

// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch32');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('jinaai/jina-clip-v1');

// Run tokenization
const texts = ['A blue cat', 'A red cat'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });

// Compute text embeddings
const { text_embeds } = await text_model(text_inputs);

// Read images and run processor
const urls = [
    'https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg',
    'https://i.pinimg.com/736x/c9/f2/3e/c9f23e212529f13f19bad5602d84b78b.jpg'
];
const image = await Promise.all(urls.map(url => RawImage.read(url)));
const image_inputs = await processor(image);

// Compute vision embeddings
const { image_embeds } = await vision_model(image_inputs);

//  Compute similarities
console.log(cos_sim(text_embeds[0].data, text_embeds[1].data)) // text embedding similarity
console.log(cos_sim(text_embeds[0].data, image_embeds[0].data)) // text-image cross-modal similarity
console.log(cos_sim(text_embeds[0].data, image_embeds[1].data)) // text-image cross-modal similarity
console.log(cos_sim(text_embeds[1].data, image_embeds[0].data)) // text-image cross-modal similarity
console.log(cos_sim(text_embeds[1].data, image_embeds[1].data)) // text-image cross-modal similarity

Jun 05 '24 16:06 xenova

Feel so fuck for the v3 version. Because there is no v3 for nodejs and the new onnx package is only work for v3

Jul 23 '24 05:07 wujohns

The code not work at all, and when I try to using optimum-cli to build the onnx model, the optimum not support the nomic-bert type model(nomic-embed-text-v1.5 can be build but the nomic-embed-vision-v1.5 failed) so there is no way to run the demo code in transformer.js even stable version If v3 not ready please not release the onnx only for v3

Jul 24 '24 02:07 wujohns

transformers.js transformers.js copied to clipboard

jinaai/jina-clip-v1: support for model names with prefixes

Model description

Prerequisites

Additional information

Your contribution

transformers.js
transformers.js copied to clipboard