Verba
Verba copied to clipboard
Instruction: How to add BAAI/bge-m3 embedder
Hi everyone. This is working sample how to add BAAI/bge-m3 embedder to Verba.
- Create copy of MiniLMEmbedder.py file and rename it to "BGEM3Embedder.py" in goldenverba/components/embedding
- Make changes in the file: rename MiniLMEmbedder class to BGEM3Embedder and so on:
from tqdm import tqdm
from wasabi import msg
from weaviate import Client
from goldenverba.components.embedding.interface import Embedder
from goldenverba.components.reader.document import Document
class BGEM3Embedder(Embedder):
"""
BGEM3Embedder for Verba.
"""
def __init__(self):
super().__init__()
self.name = "BGEM3Embedder"
self.requires_library = ["torch", "transformers"]
self.description = "Embeds and retrieves objects using SentenceTransformer's BAAI/bge-m3 model"
self.vectorizer = "BAAI/bge-m3"
self.model = None
self.tokenizer = None
try:
import torch
from transformers import AutoModel, AutoTokenizer
def get_device():
if torch.cuda.is_available():
return torch.device("cuda")
elif torch.backends.mps.is_available():
return torch.device("mps")
else:
return torch.device("cpu")
self.device = get_device()
self.model = AutoModel.from_pretrained(
"BAAI/bge-m3", device_map=self.device
)
self.tokenizer = AutoTokenizer.from_pretrained(
"BAAI/bge-m3", device_map=self.device
)
self.model = self.model.to(self.device)
...
- In manager.py in goldenverba/components/embedding make this changes:
from goldenverba.components.embedding.MiniLMEmbedder import MiniLMEmbedder
from goldenverba.components.embedding.BGEM3Embedder import BGEM3Embedder
from goldenverba.components.reader.document import Document
class EmbeddingManager:
def __init__(self):
self.embedders: dict[str, Embedder] = {
"MiniLMEmbedder": MiniLMEmbedder(),
"BGEM3Embedder": BGEM3Embedder(),
"ADAEmbedder": ADAEmbedder(),
"CohereEmbedder": CohereEmbedder(),
}
...
- Make changes in goldenverba/components/schema/schema_generation.py:
VECTORIZERS = {"text2vec-openai", "text2vec-cohere"} # Needs to match with Weaviate modules
EMBEDDINGS = {"MiniLM", "BAAI/bge-m3"} # Custom Vectors
- Done! Start Verba!
P.S. If you want to use English specific model like "BAAI/bge-large-en" just use "BAAI/bge-large-en" instead of "BAAI/bge-m3" and use appropriate names for files.
Great work! We'll look into this for the next update
@bakongi I've done the same as you but I can't figure out where to choose this custom embedder in the frontend of Verba. Any suggestions please?
@bakongi I've done the same as you but I can't figure out where to choose this custom embedder in the frontend of Verba. Any suggestions please?
How you installed verba - pip or from sources?
@bakongi I installed Verba using pip install goldenverba like shown in the documentation
@bakongi I installed Verba using
pip install goldenverbalike shown in the documentation
Ok. Where did you make changes? (folder path) I think you should make changes in python shared library folder where verba is installed
@bakongi I make the changes exactly in the files that you mentioned. "I think you should make changes in python shared library folder where verba is installed" Can you please elaborate?
@bakongi One more thing, the new embedding model that I added doesn't seem to be downloaded from HugginFace my guess is an api key should be configured or does sentence_transformers do the whole job? Thank you
@bakongi One more thing, the new embedding model that I added doesn't seem to be downloaded from HugginFace my guess is an api key should be configured or does sentence_transformers do the whole job? Thank you
The location of the Python shared library folder where installed libraries are stored depends on your operating system and the environment in which Python is running. Here are the typical locations for different environments:
On Unix-like systems (Linux, macOS):
-
System-wide installations: Libraries are generally stored in:
/usr/lib/pythonX.Y/site-packagesor/usr/local/lib/pythonX.Y/site-packages(whereX.Yis your Python version, e.g.,python3.9).
-
User-specific installations: If you've installed libraries using
pipwith the--useroption:~/.local/lib/pythonX.Y/site-packages
-
Virtual environments: If you're using a virtual environment (created with
venvorvirtualenv), libraries are stored within the virtual environment directory:<virtualenv_path>/lib/pythonX.Y/site-packages
On Windows:
-
System-wide installations: Libraries are typically found in:
C:\PythonXY\Lib\site-packages(whereXYis your Python version, e.g.,Python39).
-
User-specific installations: If you've installed libraries using
pipwith the--useroption:C:\Users\<YourUsername>\AppData\Roaming\Python\PythonXY\site-packages
-
Virtual environments: If you're using a virtual environment, libraries are stored within the virtual environment directory:
<virtualenv_path>\Lib\site-packages
Checking the location programmatically:
You can also check the location of installed libraries programmatically using Python:
import site
import sys
# List all site-packages directories
print(site.getsitepackages())
# List user-specific site-packages directory
print(site.getusersitepackages())
# List all paths where Python looks for packages
print(sys.path)
This code will print the paths where Python searches for libraries, including the site-packages directories.
I see. I've installed Verba pip install goldenverba on a virtual environment created using python venv and it's located in the project directory. Is this correct?
I see. I've installed Verba
pip install goldenverbaon a virtual environment created using python venv and it's located in the project directory. Is this correct?
When you install a Python package in a virtual environment, the package is installed within the directory structure of the virtual environment itself. This ensures that the package dependencies are isolated from the global Python environment and any other virtual environments you might have.
Here's a typical structure of a virtual environment:
<project_directory>/ ├── <venv_name>/ │ ├── bin/ # Executables and scripts (Linux/macOS) or Scripts/ (Windows) │ ├── lib/ # Libraries (Linux/macOS) or Lib/ (Windows) │ │ └── pythonX.Y/ │ │ └── site-packages/ │ │ └── goldenverba/ ├── your_project_files/ └── ...
So what should I do in this case for the project to run correctly?
So what should I do in this case for the project to run correctly?
Go to<venv_name>\Lib\site-packages\goldenverba and make nesessary changes in files in "components" folder and subfolder
or, if you downloaded sourse files and made changes there just run
pip install -e .
in your virtual anv.
not sure if this is your problem @moncefarajdal but I think you need to install pip install goldenverba[huggingface]
Hi everyone. This is working sample how to add BAAI/bge-m3 embedder to Verba. …
for this to show up in Verba, you also need to adjust goldenverba/components/embedding/manager.py accordingly
unsubscribe
From: luc42ei Date: 2024-06-05 22:33 To: weaviate/Verba CC: Subscribed Subject: Re: [weaviate/Verba] Instruction: How to add BAAI/bge-m3 embedder (Issue #128) Hi everyone. This is working sample how to add BAAI/bge-m3 embedder to Verba. … for this to show up in Verba, you also need to adjust goldenverba/components/embedding/manager.py accordingly — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
We added the model to the newest release 🚀