chroma icon indicating copy to clipboard operation
chroma copied to clipboard

Unexpected required Embedding function on Javascript version error [Bug]:

Open lynxionxs opened this issue 1 year ago • 7 comments

What happened?

Why am i getting an Error to supply an embedding function in the javascript version ? The docs said nothing about it. I just want to make a simple "Hello World" type script.

The Chroma server starts successfully

chroma run --port 5000 --path chroma-data/

When i do collection.add() it gives that error

import { ChromaClient } from "chromadb";

const client = new ChromaClient({ path: "http://localhost:5000" });

const collection = await client.getOrCreateCollection({
	name: "collection_b",
});

collection.add({
	documents: [
		"apples on a tree",
		"five pears in a basket",
		"fish in a pond",
		"pool full of water",
		"tasty apple pie",
		"yummy orange juice",
		"a large apple tree",
		"two baskets filled with fruits",
	],
	metadatas: [
		{ fruit: "apple" },
		{ fruit: "pear" },
		{ animal: "fish" },
		{ location: "pool" },
		{ food: "pie" },
		{ drink: "juice" },
		{ fruit: "apple" },
		{ fruit: "fruits" },
	],
	ids: ["doc1", "doc2", "doc3", "doc4", "doc5", "doc6", "doc7", "doc8"],
});

Versions

Chroma v0.4.18 , ChromaJS v1.6.1 , Python 3.10 , Linux Mint 21.2

Relevant log output

/home/user/ChromaDB/node_modules/chromadb/dist/chromadb.mjs:1264
        throw new Error(
              ^

Error: embeddingFunction is undefined. Please configure an embedding function
    at Collection.validate (file:///home/user/ChromaDB/node_modules/chromadb/dist/chromadb.mjs:1264:15)
    at Collection.add (file:///home/user/ChromaDB/node_modules/chromadb/dist/chromadb.mjs:1331:84)
    at file:///home//home/user/ChromaDB/chroma-server.js:9:12
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Node.js v20.10.0

lynxionxs avatar Dec 04 '23 08:12 lynxionxs

Hey @lynxionxs, you must provide an embedding function when creating or getting it. This is the API docs - https://docs.trychroma.com/js_reference/Client#getorcreatecollection

Here is an example from Chroma docs:

// CJS
const { OpenAIEmbeddingFunction } = require("chromadb");

// ESM
import { OpenAIEmbeddingFunction } from 'chromadb'

const embedder = new OpenAIEmbeddingFunction({
  openai_api_key: "your_api_key",
});
const collection = await client.createCollection({
  name: "my_collection",
  embeddingFunction: embedder,
});

tazarov avatar Dec 04 '23 10:12 tazarov

@tazarov i'm not using OpenAI api. Why is Chroma not taking care of the embeddings function like the default python version is ? Where in the mess of the docs do they even show how to use an embedding function other than OpenAi and api's. Why is making a super simple script so difficult, with no real examples to build on ? the docs for getOrCreateCollection() says embeddingFunction is optional params.embeddingFunction?: Optional custom embedding function for the collection. So one would expect passing no embedding function that Chroma will use a default one, like the python version?

lynxionxs avatar Dec 04 '23 11:12 lynxionxs

@lynxionxs, I understand your frustration. The historical reason why JS is not on par with Python is that most of the AI/ML libs exist in Python, not JS. This is increasingly not the case and we welcome any and all contributions from people interested in the JS ecosystem ("Aut viam inveniam aut faciam")

Regarding all of the available embedding functions for JS, check here - https://docs.trychroma.com/embeddings?lang=js

Regarding the optionality of EF in the getCollection and getOrCreateCollection methods, this is an intermediary design while Chroma matures to a state where users don't need to specify the EF explicitly. there are some discussions and considerations around that, and we're looking to make improvements in the short run.

Default EF in Python client relies on MiniLM running on ONNX runtime that operates locally. I can see that ONNX is also available in JS, but there are some assumptions which vary between deployments - e.g. some users might choose to run the chroma JS client in react apps (aka client-side) whereas others may choose the nodejs path. That said, I think we can explore options for adding node.js based onnx support in the near future.

tazarov avatar Dec 04 '23 12:12 tazarov

@tazarov that sounds good. Hope it gets better. Thanks for the thorough explanation.

lynxionxs avatar Dec 04 '23 13:12 lynxionxs

@lynxionxs, I've created an issue for the default EF DX - https://github.com/chroma-core/chroma/issues/1456.

tazarov avatar Dec 04 '23 13:12 tazarov

Wasup, was this ever resolved? @tazarov

For posterity incase this doesn't get resolved:

import { pipeline, env } from "@xenova/transformers";

env.localModelPath = "/Users/my_user/develop/all-models";
const MODEL_NAME = "Xenova/all-MiniLM-L6-v2";

export async function createEmbedder() {
  const extractor = await pipeline("feature-extraction", MODEL_NAME, {
    quantized: false,
});

  const generate = async (texts: string[]): Promise<number[][]> => {
    const embeddings: number[][] = await Promise.all(
      texts.map(async (text) => {
        const output = await extractor(text, {
          pooling: "mean",
          normalize: true,
        });
        return Array.from(output.data) as number[];
      })
    );
    return embeddings;
  };

  return { generate };
}

spankyed avatar Apr 14 '24 08:04 spankyed

@lynxionxs, I understand your frustration. The historical reason why JS is not on par with Python is that most of the AI/ML libs exist in Python, not JS. This is increasingly not the case and we welcome any and all contributions from people interested in the JS ecosystem ("Aut viam inveniam aut faciam")

Regarding all of the available embedding functions for JS, check here - https://docs.trychroma.com/embeddings?lang=js

I'm new to this, trying to play around and having the same error. The link you have provided seems does not work ("404 This page could not be found.")

The whole process is a bit frustrating. I could not find installer for Win. Had to follow all the steps, starting with installing python so I could use 'pip' command in console... then chroma wanted to have some MS runtime libraries... then it did not compile so I had to install 3+ Gb of MS SDK... now it wants embedding from OpenAI. It seems never ending. What else it would require after I get and supply it with OpenAI api key?

UPD: after supplying it with OpenAI API key if failed again :) "Uncaught Error Error: Please install openai as a dependency with, e.g. yarn add openai "

expy777 avatar May 16 '24 10:05 expy777