chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: uris does not work with AmazonBedrockEmbeddingFunction

Open jc1518 opened this issue 11 months ago • 0 comments

What happened?

When using AmazonBedrockEmbeddingFunction (using "amazon.titan-embed-image-v1") as the embedding_function, ImageLoader as the data_loader, I encountered error TypeError: Object of type ndarray is not JSON serializable

Here is the code:

import os
import mimetypes

import boto3
import chromadb
from chromadb.utils.embedding_functions import AmazonBedrockEmbeddingFunction
from chromadb.utils.data_loaders import ImageLoader

boto_session = boto3.Session(profile_name="sandbox", region_name="us-west-2")
model_id = "amazon.titan-embed-image-v1"

embedding_function = AmazonBedrockEmbeddingFunction(
    session=boto_session, model_name=model_id
)

data_loader = ImageLoader()

client = chromadb.Client()

collection = client.get_or_create_collection(
    name="sample-image-library",
    embedding_function=embedding_function,
    data_loader=data_loader,
)

dir_path = "../samples/"

images_uris = sorted(
    [
        os.path.join(dir_path, image)
        for image in os.listdir(dir_path)
        if mimetypes.guess_type(os.path.join(dir_path, image))[0] == "image/png"
    ]
)

ids = [f"{i+1}" for i in range(len(images_uris))]
print(ids)
print(images_uris)

collection.add(ids=ids, uris=images_uris)

Versions

Chroma 0.4.24, Python 3.11.4, MacOS: 14.4

Relevant log output

(venv) ➜ python test.py

['1', '2', '3']

['../samples/Coles.png', '../samples/Floor-plan.png', '../samples/Spot-difference-01.png']

Traceback (most recent call last):
  File "test.py", line 40, in <module>
    collection.add(ids=ids, uris=images_uris)
  File "/Users/jackie/venv/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 166, in add
    embeddings = self._embed(self._data_loader(uris))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jackie/venv/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 633, in _embed
    return self._embedding_function(input=input)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jackie/venv/lib/python3.11/site-packages/chromadb/api/types.py", line 193, in __call__
    result = call(self, input)
             ^^^^^^^^^^^^^^^^^
  File "/Users/jackie/venv/lib/python3.11/site-packages/chromadb/utils/embedding_functions.py", line 763, in __call__
    body = json.dumps(input_body)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jackie/.pyenv/versions/3.11.4/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jackie/.pyenv/versions/3.11.4/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jackie/.pyenv/versions/3.11.4/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/Users/jackie/.pyenv/versions/3.11.4/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ndarray is not JSON serializable

jc1518 avatar Mar 18 '24 03:03 jc1518