chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: ONNXRuntime error on multiple document upserts

Open 0xjgv opened this issue 1 year ago • 4 comments

What happened?

Getting the following error when upserting multiple documents/metadatas/ids:

        self.collection.upsert(
            documents=documents[i : i + chunk_size],
            metadatas=metadatas[i : i + chunk_size],
            ids=ids[i : i + chunk_size],
        )
2024-02-27 17:33:39.257308 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running CoreML_3590863580933273282_1 node. Name:'CoreMLExecutionProvider_CoreML_3590863580933273282_1_1' Status Message: Error executing model: Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).

The upsertion works as expected when inserting documents one at a time.

            self.collection.upsert(documents=[doc], ids=[id], metadatas=[metadata])

Versions

[tool.poetry.dependencies]
python = "^3.12"
chromadb = "^0.4.23"

Relevant log output

2024-02-27 17:33:39.257308 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running CoreML_3590863580933273282_1 node. Name:'CoreMLExecutionProvider_CoreML_3590863580933273282_1_1' Status Message: Error executing model: Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).
[EntityIndexer | 2024-02-27 17:33:39.258340] Error during batch upsert: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running CoreML_3590863580933273282_1 node. Name:'CoreMLExecutionProvider_CoreML_3590863580933273282_1_1' Status Message: Error executing model: Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1)

0xjgv avatar Feb 27 '24 16:02 0xjgv

@0xjgv, this is an onnxruntime issue on MacOS; if you downgrade to 1.16.3, e.g., pip install onnxruntime==1.16.3, you should be ok.

tazarov avatar Mar 10 '24 05:03 tazarov

@0xjgv did this fix your issue? I'm doing a bug hygiene pass. Thanks!

beggers avatar Mar 12 '24 15:03 beggers

I hit the same issue, with a bit more information. On a macbook with a M1 cpu, chromadb 0.4.24 and onnxruntime 1.17.3 works fine. On an older macbook with an Intel cpu it fails. Dropping back to onnxruntime 1.16.3 on the Intel macbook is successful.

dannyrappleyea avatar Apr 16 '24 12:04 dannyrappleyea

I hit the same issue. On a macbook with an Intel cpu, chromadb 0.5.5 and onnxruntime 1.18.1 works fails. Dropping back to onnxruntime 1.16.3 is successful.

jdzhang1221 avatar Aug 01 '24 10:08 jdzhang1221

@jdzhang1221, @dannyrappleyea, @0xjgv there are some inherent limitations in the CoreML provider which OnnxRuntime uses by default on MacOS. To avoid problems with that, you can switch to using the CPU provider (possibly slightly slower):

from chromadb.utils.embedding_functions.onnx_mini_lm_l6_v2 import ONNXMiniLM_L6_V2 # this is how you import in Chroma 0.5.0+
# from chromadb.utils.embedding_functions import ONNXMiniLM_L6_V2 # legacy import for Chroma <=0.4.24

ef = ONNXMiniLM_L6_V2(preferred_providers=["CPUExecutionProvider"])
import chromadb
client = chromadb.Client()
collection = client.get_or_create_collection("<collection_name>", embedding_function=ef)
collection.upsert(
documents=[
"This is a document about pineapple",
"This is a document about oranges"
],
ids=["id1", "id2"]
)

tazarov avatar Aug 28 '24 10:08 tazarov