chroma
chroma copied to clipboard
Unexpected required Embedding function on Javascript version error [Bug]:
What happened?
Why am i getting an Error to supply an embedding function in the javascript version ? The docs said nothing about it. I just want to make a simple "Hello World" type script.
The Chroma server starts successfully
chroma run --port 5000 --path chroma-data/
When i do collection.add()
it gives that error
import { ChromaClient } from "chromadb";
const client = new ChromaClient({ path: "http://localhost:5000" });
const collection = await client.getOrCreateCollection({
name: "collection_b",
});
collection.add({
documents: [
"apples on a tree",
"five pears in a basket",
"fish in a pond",
"pool full of water",
"tasty apple pie",
"yummy orange juice",
"a large apple tree",
"two baskets filled with fruits",
],
metadatas: [
{ fruit: "apple" },
{ fruit: "pear" },
{ animal: "fish" },
{ location: "pool" },
{ food: "pie" },
{ drink: "juice" },
{ fruit: "apple" },
{ fruit: "fruits" },
],
ids: ["doc1", "doc2", "doc3", "doc4", "doc5", "doc6", "doc7", "doc8"],
});
Versions
Chroma v0.4.18 , ChromaJS v1.6.1 , Python 3.10 , Linux Mint 21.2
Relevant log output
/home/user/ChromaDB/node_modules/chromadb/dist/chromadb.mjs:1264
throw new Error(
^
Error: embeddingFunction is undefined. Please configure an embedding function
at Collection.validate (file:///home/user/ChromaDB/node_modules/chromadb/dist/chromadb.mjs:1264:15)
at Collection.add (file:///home/user/ChromaDB/node_modules/chromadb/dist/chromadb.mjs:1331:84)
at file:///home//home/user/ChromaDB/chroma-server.js:9:12
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Node.js v20.10.0
Hey @lynxionxs, you must provide an embedding function when creating or getting it. This is the API docs - https://docs.trychroma.com/js_reference/Client#getorcreatecollection
Here is an example from Chroma docs:
// CJS
const { OpenAIEmbeddingFunction } = require("chromadb");
// ESM
import { OpenAIEmbeddingFunction } from 'chromadb'
const embedder = new OpenAIEmbeddingFunction({
openai_api_key: "your_api_key",
});
const collection = await client.createCollection({
name: "my_collection",
embeddingFunction: embedder,
});
@tazarov i'm not using OpenAI api. Why is Chroma not taking care of the embeddings function like the default python version is ?
Where in the mess of the docs do they even show how to use an embedding function other than OpenAi and api's. Why is making a super simple script so difficult, with no real examples to build on ?
the docs for getOrCreateCollection()
says embeddingFunction is optional
params.embeddingFunction?: Optional custom embedding function for the collection.
So one would expect passing no embedding function that Chroma will use a default one, like the python version?
@lynxionxs, I understand your frustration. The historical reason why JS is not on par with Python is that most of the AI/ML libs exist in Python, not JS. This is increasingly not the case and we welcome any and all contributions from people interested in the JS ecosystem ("Aut viam inveniam aut faciam")
Regarding all of the available embedding functions for JS, check here - https://docs.trychroma.com/embeddings?lang=js
Regarding the optionality of EF in the getCollection
and getOrCreateCollection
methods, this is an intermediary design while Chroma matures to a state where users don't need to specify the EF explicitly. there are some discussions and considerations around that, and we're looking to make improvements in the short run.
Default EF in Python client relies on MiniLM running on ONNX runtime that operates locally. I can see that ONNX is also available in JS, but there are some assumptions which vary between deployments - e.g. some users might choose to run the chroma JS client in react apps (aka client-side) whereas others may choose the nodejs path. That said, I think we can explore options for adding node.js based onnx support in the near future.
@tazarov that sounds good. Hope it gets better. Thanks for the thorough explanation.
@lynxionxs, I've created an issue for the default EF DX - https://github.com/chroma-core/chroma/issues/1456.
Wasup, was this ever resolved? @tazarov
For posterity incase this doesn't get resolved:
import { pipeline, env } from "@xenova/transformers";
env.localModelPath = "/Users/my_user/develop/all-models";
const MODEL_NAME = "Xenova/all-MiniLM-L6-v2";
export async function createEmbedder() {
const extractor = await pipeline("feature-extraction", MODEL_NAME, {
quantized: false,
});
const generate = async (texts: string[]): Promise<number[][]> => {
const embeddings: number[][] = await Promise.all(
texts.map(async (text) => {
const output = await extractor(text, {
pooling: "mean",
normalize: true,
});
return Array.from(output.data) as number[];
})
);
return embeddings;
};
return { generate };
}
@lynxionxs, I understand your frustration. The historical reason why JS is not on par with Python is that most of the AI/ML libs exist in Python, not JS. This is increasingly not the case and we welcome any and all contributions from people interested in the JS ecosystem ("Aut viam inveniam aut faciam")
Regarding all of the available embedding functions for JS, check here - https://docs.trychroma.com/embeddings?lang=js
I'm new to this, trying to play around and having the same error. The link you have provided seems does not work ("404 This page could not be found.")
The whole process is a bit frustrating. I could not find installer for Win. Had to follow all the steps, starting with installing python so I could use 'pip' command in console... then chroma wanted to have some MS runtime libraries... then it did not compile so I had to install 3+ Gb of MS SDK... now it wants embedding from OpenAI. It seems never ending. What else it would require after I get and supply it with OpenAI api key?
UPD: after supplying it with OpenAI API key if failed again :)
"Uncaught Error Error: Please install openai as a dependency with, e.g. yarn add openai
"