fastembed icon indicating copy to clipboard operation
fastembed copied to clipboard

[Model]: thenlper/gte-base

Open gili-vega opened this issue 8 months ago • 1 comments

Which model would you like to support?

https://huggingface.co/thenlper/gte-base

What are the main advantages of this model?

Leaner than the gte-large, in terms of weight and embedding size. Very common and widely used model. Would appreciate adding it to the supported models.

gili-vega avatar Mar 21 '25 22:03 gili-vega

Hey @gili-vega, this model can be added in runtime via add_custom_model interface

Here is an example of the interface:

from fastembed import TextEmbedding
from fastembed.common.model_description import PoolingType, ModelSource

TextEmbedding.add_custom_model(
    model="intfloat/multilingual-e5-small",
    pooling=PoolingType.MEAN,
    normalization=True,
    sources=ModelSource(hf="intfloat/multilingual-e5-small"),  # can be used with an `url` to load files from a private storage
    dim=384,
    model_file="onnx/model.onnx",  # can be used to load an already supported model with another optimization or quantization, e.g. onnx/model_O4.onnx
)
model = TextEmbedding(model_name="intfloat/multilingual-e5-small")
embeddings = list(model.embed(documents))

joein avatar Mar 22 '25 09:03 joein