fastembed [Model]: Multi-modal embedding: Alibaba-NLP/gme-Qwen2-VL-2B-Instruct

[Model]: Multi-modal embedding: Alibaba-NLP/gme-Qwen2-VL-2B-Instruct

Open xiabo0816 opened this issue 10 months ago • 2 comments

Which model would you like to support?

Hey!

Qwen & Alibaba has open sourced a whole bunch of amazing multimodal models. These models have a ton of potential and could bring about some really cool advancements.

So, I'd really appreciate it if you could spare a bit more attention for them.

Especially, the GME and GTE series are pretty special. They might just hold the key to some exciting new developments. Have a look and see what you think!

https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct https://huggingface.co/Alibaba-NLP/gte-modernbert-base

What are the main advantages of this model?

Multi-modal Retrieval Augmented Generation (RAG) has turned into a super hot new area lately. It's really making waves in the field!

Jan 24 '25 02:01 xiabo0816

also see: https://github.com/BIGBALLON/GME-Search

Feb 05 '25 06:02 BIGBALLON

hey @xiabo0816 @BIGBALLON

Unfortunately, https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct providers have not converted the model to onnx, and we might not be able to do so in the foreseen future

Regarding the second model, it has model.onnx files and it can be added to fastembed in runtime It'll look like this (I haven't tested this particular example, might require minimal adjustment)

from fastembed import TextEmbedding
from fastembed.common.model_description import PoolingType, ModelSource

TextEmbedding.add_custom_model(
    model="Alibaba-NLP/gte-modernbert-base",
    pooling=PoolingType.MEAN,  # it might be CLS or DISABLED depends on the model output
    normalization=True,
    sources=ModelSource(hf="Alibaba-NLP/gte-modernbert-base"), 
    dim=<output embedding dimensionality>,
    model_file="onnx/model.onnx", 
)
model = TextEmbedding(model_name="Alibaba-NLP/gte-modernbert-base")
embeddings = list(model.embed(documents))

Mar 02 '25 15:03 joein

fastembed fastembed copied to clipboard

[Model]: Multi-modal embedding: Alibaba-NLP/gme-Qwen2-VL-2B-Instruct

Which model would you like to support?

What are the main advantages of this model?

fastembed
fastembed copied to clipboard