Verba
Verba copied to clipboard
Instruction: How to add BAAI/bge-m3 embedder
Hi everyone. This is working sample how to add BAAI/bge-m3 embedder to Verba.
- Create copy of MiniLMEmbedder.py file and rename it to "BGEM3Embedder.py" in goldenverba/components/embedding
- Make changes in the file: rename MiniLMEmbedder class to BGEM3Embedder and so on:
from tqdm import tqdm
from wasabi import msg
from weaviate import Client
from goldenverba.components.embedding.interface import Embedder
from goldenverba.components.reader.document import Document
class BGEM3Embedder(Embedder):
"""
BGEM3Embedder for Verba.
"""
def __init__(self):
super().__init__()
self.name = "BGEM3Embedder"
self.requires_library = ["torch", "transformers"]
self.description = "Embeds and retrieves objects using SentenceTransformer's BAAI/bge-m3 model"
self.vectorizer = "BAAI/bge-m3"
self.model = None
self.tokenizer = None
try:
import torch
from transformers import AutoModel, AutoTokenizer
def get_device():
if torch.cuda.is_available():
return torch.device("cuda")
elif torch.backends.mps.is_available():
return torch.device("mps")
else:
return torch.device("cpu")
self.device = get_device()
self.model = AutoModel.from_pretrained(
"BAAI/bge-m3", device_map=self.device
)
self.tokenizer = AutoTokenizer.from_pretrained(
"BAAI/bge-m3", device_map=self.device
)
self.model = self.model.to(self.device)
...
- In manager.py in goldenverba/components/embedding make this changes:
from goldenverba.components.embedding.MiniLMEmbedder import MiniLMEmbedder
from goldenverba.components.embedding.BGEM3Embedder import BGEM3Embedder
from goldenverba.components.reader.document import Document
class EmbeddingManager:
def __init__(self):
self.embedders: dict[str, Embedder] = {
"MiniLMEmbedder": MiniLMEmbedder(),
"BGEM3Embedder": BGEM3Embedder(),
"ADAEmbedder": ADAEmbedder(),
"CohereEmbedder": CohereEmbedder(),
}
...
- Make changes in goldenverba/components/schema/schema_generation.py:
VECTORIZERS = {"text2vec-openai", "text2vec-cohere"} # Needs to match with Weaviate modules
EMBEDDINGS = {"MiniLM", "BAAI/bge-m3"} # Custom Vectors
- Done! Start Verba!
P.S. If you want to use English specific model like "BAAI/bge-large-en" just use "BAAI/bge-large-en" instead of "BAAI/bge-m3" and use appropriate names for files.