chromadb-java-client icon indicating copy to clipboard operation
chromadb-java-client copied to clipboard

Default ChromaDB embeddings ` all-MiniLM-L6-v2`

Open namedgraph opened this issue 10 months ago • 1 comments

My understanding was that ChromaDB's default embeddings are running locally and do not require an API key. However I cannot find an example like this in the README, all examples require an API key. Am I missing something?

namedgraph avatar Apr 18 '24 22:04 namedgraph

I got myself a HuggingFace API key and tried using HuggingFaceEmbeddingFunction.

The documents get created in ChromaDB with correct documents and metadata, but the embeddings field is null. Is that expected?

namedgraph avatar Apr 19 '24 08:04 namedgraph

@namedgraph, Chroma would not accept a request with null as embeddings. Let me have a look. As far as the default embedding, you are right about it running locally however that is for the Python client.

I'll investigate whether on runtime (it is written in C/C++, so it might not be very platform-independent like Java) can be executed within Java.

tazarov avatar May 27 '24 18:05 tazarov

@namedgraph, you are in luck MS have added support - https://github.com/microsoft/onnxruntime/blob/main/java/README.md.

I'll implement it shortly.

tazarov avatar May 27 '24 19:05 tazarov

@tazarov Hello,How to run the ChromaDB's default embeddings in local

haqian555 avatar Aug 07 '24 06:08 haqian555

@namedgraph and @haqian555, I spent some time to day and I'm happy to say that I've managed to get a Default embedding function with mini-lm model running and generating results inline with what the original Chroma EF is doing. The good news is that it will also work for better models that have been converted to ort.

I'll run some tests that prove this works not only on my machine :) I'll add this functionality over the next couple of days. Thanks for your patience :).

tazarov avatar Aug 07 '24 15:08 tazarov

That's great,I really appreciate your work

haqian555 avatar Aug 08 '24 06:08 haqian555

@haqian555 and @namedgraph the default EF functionality is now merged. Sorry it took a little longer, but had to make sure it was identical to Chroma's default EF and SentenceTransformers equivalents in Python.

tazarov avatar Aug 12 '24 12:08 tazarov