chroma
chroma copied to clipboard
[Bug]: uris does not work with AmazonBedrockEmbeddingFunction
What happened?
When using AmazonBedrockEmbeddingFunction (using "amazon.titan-embed-image-v1") as the embedding_function, ImageLoader as the data_loader, I encountered error TypeError: Object of type ndarray is not JSON serializable
Here is the code:
import os
import mimetypes
import boto3
import chromadb
from chromadb.utils.embedding_functions import AmazonBedrockEmbeddingFunction
from chromadb.utils.data_loaders import ImageLoader
boto_session = boto3.Session(profile_name="sandbox", region_name="us-west-2")
model_id = "amazon.titan-embed-image-v1"
embedding_function = AmazonBedrockEmbeddingFunction(
session=boto_session, model_name=model_id
)
data_loader = ImageLoader()
client = chromadb.Client()
collection = client.get_or_create_collection(
name="sample-image-library",
embedding_function=embedding_function,
data_loader=data_loader,
)
dir_path = "../samples/"
images_uris = sorted(
[
os.path.join(dir_path, image)
for image in os.listdir(dir_path)
if mimetypes.guess_type(os.path.join(dir_path, image))[0] == "image/png"
]
)
ids = [f"{i+1}" for i in range(len(images_uris))]
print(ids)
print(images_uris)
collection.add(ids=ids, uris=images_uris)
Versions
Chroma 0.4.24, Python 3.11.4, MacOS: 14.4
Relevant log output
(venv) ➜ python test.py
['1', '2', '3']
['../samples/Coles.png', '../samples/Floor-plan.png', '../samples/Spot-difference-01.png']
Traceback (most recent call last):
File "test.py", line 40, in <module>
collection.add(ids=ids, uris=images_uris)
File "/Users/jackie/venv/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 166, in add
embeddings = self._embed(self._data_loader(uris))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jackie/venv/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 633, in _embed
return self._embedding_function(input=input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jackie/venv/lib/python3.11/site-packages/chromadb/api/types.py", line 193, in __call__
result = call(self, input)
^^^^^^^^^^^^^^^^^
File "/Users/jackie/venv/lib/python3.11/site-packages/chromadb/utils/embedding_functions.py", line 763, in __call__
body = json.dumps(input_body)
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jackie/.pyenv/versions/3.11.4/lib/python3.11/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jackie/.pyenv/versions/3.11.4/lib/python3.11/json/encoder.py", line 200, in encode
chunks = self.iterencode(o, _one_shot=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jackie/.pyenv/versions/3.11.4/lib/python3.11/json/encoder.py", line 258, in iterencode
return _iterencode(o, 0)
^^^^^^^^^^^^^^^^^
File "/Users/jackie/.pyenv/versions/3.11.4/lib/python3.11/json/encoder.py", line 180, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ndarray is not JSON serializable