chroma
chroma copied to clipboard
TypeError: Cannot read properties of undefined (reading 'data') with existing JS library
I think there's bug in the JS library.
weird error with unhelpful message:
node databaseInserter.mjs 130 ↵
/home/asd/go/src/xxx/node_modules/chromadb/dist/main/index.js:234
return response.data;
^
TypeError: Cannot read properties of undefined (reading 'data')
at /home/asd/go/src/xxx/node_modules/chromadb/dist/main/index.js:234:29
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async ChromaClient.createCollection (/home/asd/go/src/xxx/node_modules/chromadb/dist/main/index.js:229:31)
at async file:///home/asd/go/src/xxx/databaseInserter.mjs:8:20
Node.js v18.15.0
the code (minimal modification from the example):
import { ChromaClient, OpenAIEmbeddingFunction } from "chromadb";
import fs from "fs";
const chroma_client = new ChromaClient();
const OPENAI_API_KEY = fs.readFileSync('openai.key', 'utf8');
const embedder = new OpenAIEmbeddingFunction(OPENAI_API_KEY)
const collection = await chroma_client.createCollection("my_collection", {}, embedder)
const jsonBytes = fs.readFileSync('out.json', 'utf8'); // contains {"documentList": [...], "metaList": [...]}
const json = JSON.parse(jsonBytes);
// create id
let counter = 0;
const ids = json.documentList.map(() => "id" + (++counter)); // convert to id
await collection.add(
ids,
undefined,
json.metaList,
json.documentList,
);
console.log('total documents',await collection.count());
package.json:
{
"dependencies": {
"chromadb": "^1.3.1",
"openai": "^3.2.1"
}
}
+1
@kokizzu @rrubio hello! I agree this error message is not helpful. This is because the default port for the Docker container running localhost:8000 is not up. I was able to reproduce this locally. Can you confirm that you are running the docker container? Currently the JS client is a client "only" and talks to a backend.
following up here! would love to get to the bottom of this
@jeffchuber , I weirdly got this error while trying to ingest a lot of PDF files, 10 or so. Each file is split into chunks, embedded, before being inserted using langchain.
When ingesting 2-4 PDF files at a time, it was ok.
Error:
TypeError: Cannot read properties of undefined (reading 'data')
at <anonymous> (/<removed>/node_modules/.pnpm/[email protected]/node_modules/chromadb/dist/main/index.js:136:29)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at Collection.add (/<removed>/node_modules/.pnpm/[email protected]/node_modules/chromadb/dist/main/index.js:124:26)
at Chroma.addVectors (/<removed>/node_modules/.pnpm/[email protected]_bte6oiujo3pczovvxy7tybwkpm/node_modules/langchain/dist/vectorstores/chroma.js:78:9)
at Chroma.addDocuments (/<removed>/node_modules/.pnpm/[email protected]_bte6oiujo3pczovvxy7tybwkpm/node_modules/langchain/dist/vectorstores/chroma.js:39:9)
at Function.fromDocuments (/<removed>/node_modules/.pnpm/[email protected]_bte6oiujo3pczovvxy7tybwkpm/node_modules/langchain/dist/vectorstores/chroma.js:119:9)
at run (/<removed>/scripts/ingest-data.ts:43:5)
at <anonymous> (/<removed>/scripts/ingest-data.ts:53:3)
Code: (follows langchain + chroma doc)
// load doc
// ..
await Chroma.fromDocuments(docs, embeddings, {
collectionName: CHROMA_COLLECTION_NAME,
});
Edit: My apologies. The code is caused from a different issue from the original question. Should I make a separate Issue or on the langchain repo?
Edit2: After testing, I can only addVectors of size 499. Anything higher would cause this error.
@NanoCode012 499 is the dimensionality of the vectors? Any anything over that breaks? That is very odd!
@NanoCode012 499 is the dimensionality of the vectors? Any anything over that breaks? That is very odd!
Sorry for late reply @jeffchuber . No, I use openai , so the dimensionality follows the ada dimensionality 1500 something.
The 499 is the number of openai embeddings inserted at one time. I had to loop my insertions to only this amount per time. Also, I apologize if this is the wrong thread..
also encountered the same problem. All good when tested it locally with 10ish PDF files. However, in prod env, 100% met such problem when trying to ingest 500 PDF files.
Mind sharing how to solve it? thanks.
/home/xxx/node_modules/chromadb/dist/main/index.js:136
return response.data;
^
TypeError: Cannot read properties of undefined (reading 'data')
at /home/xxx/node_modules/chromadb/dist/main/index.js:136:29
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async Collection.add (/home/xxx/node_modules/chromadb/dist/main/index.js:124:26)
at async Chroma.addVectors (file:///home/xxx/node_modules/langchain/dist/vectorstores/chroma.js:78:9)
at async Chroma.addDocuments (file:///home/xxx/node_modules/langchain/dist/vectorstores/chroma.js:39:9)
at async Function.fromDocuments (file:///home/xxx/node_modules/langchain/dist/vectorstores/chroma.js:119:9)
at async run (file:///home/xxx/dist/ingest/ingestdirectory.js:29:25)
const model = new OpenAI(
{
modelName: "gpt-3.5-turbo",
temperature: 0,
cache: true,
concurrency: 5,
verbose: true,
openAIApiKey: process.env.OPENAI_API_KEY
}
);
// const directoryPath = "PDF_DB";
const loader = new DirectoryLoader(
"PDF_DB/",
{
".pdf": (path) => new PDFLoader(path),
},
true
);
const raw_docs = await loader.load();
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 200, chunkOverlap: 40 });
const docs = await textSplitter.splitDocuments(raw_docs);
const vectorStore = await Chroma.fromDocuments(
docs,
new OpenAIEmbeddings(),
{
'collectionName': "collection_name"
}
);
Hello @happybit , I re-created how the langchain inserts docs but edited to only submit at most 499 embeddings per time.
Fyi: It's not 500 docs for me, but 500 embeddings.
const docs = await textSplitter.splitDocuments(rawDocs);
const embeddings = new OpenAIEmbeddings();
const texts = docs.map(({ pageContent }) => pageContent);
const embedTexts = await embeddings.embedDocuments(texts);
const chroma = new Chroma(embeddings, {
collectionName: CHROMA_COLLECTION_NAME,
});
// Add the documents to the vector store by pieces
let j = 0;
const increment = 499;
for (let i = 0; i < docs.length; i+=increment) {
j = Math.min(i + increment, docs.length);
await chroma.addVectors(embedTexts.slice(i, j), docs.slice(i, j));
}
Hello @happybit , I re-created how the langchain inserts docs but edited to only submit at most 499 embeddings per time.
Fyi: It's not 500 docs for me, but 500 embeddings.
const docs = await textSplitter.splitDocuments(rawDocs); const embeddings = new OpenAIEmbeddings(); const texts = docs.map(({ pageContent }) => pageContent); const embedTexts = await embeddings.embedDocuments(texts); const chroma = new Chroma(embeddings, { collectionName: CHROMA_COLLECTION_NAME, }); // Add the documents to the vector store by pieces let j = 0; const increment = 499; for (let i = 0; i < docs.length; i+=increment) { j = Math.min(i + increment, docs.length); await chroma.addVectors(embedTexts.slice(i, j), docs.slice(i, j)); }
I will try. Thanks a lot!
@NanoCode012 great. it works like a charm. thank you so much😄
I suspect this is fixed with the switch to fetch from axios