chroma icon indicating copy to clipboard operation
chroma copied to clipboard

TypeError: Cannot read properties of undefined (reading 'data') with existing JS library

Open kokizzu opened this issue 2 years ago • 10 comments

I think there's bug in the JS library.

weird error with unhelpful message:

node databaseInserter.mjs                                                                                                                                            130 ↵
/home/asd/go/src/xxx/node_modules/chromadb/dist/main/index.js:234
            return response.data;
                            ^

TypeError: Cannot read properties of undefined (reading 'data')
    at /home/asd/go/src/xxx/node_modules/chromadb/dist/main/index.js:234:29
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async ChromaClient.createCollection (/home/asd/go/src/xxx/node_modules/chromadb/dist/main/index.js:229:31)
    at async file:///home/asd/go/src/xxx/databaseInserter.mjs:8:20

Node.js v18.15.0

the code (minimal modification from the example):

import { ChromaClient, OpenAIEmbeddingFunction } from "chromadb";
import fs from "fs";

const chroma_client = new ChromaClient();

const OPENAI_API_KEY = fs.readFileSync('openai.key', 'utf8');
const embedder = new OpenAIEmbeddingFunction(OPENAI_API_KEY)
const collection = await chroma_client.createCollection("my_collection", {}, embedder)


const jsonBytes = fs.readFileSync('out.json', 'utf8'); // contains {"documentList": [...], "metaList": [...]}
const json = JSON.parse(jsonBytes);

// create id
let counter = 0;
const ids = json.documentList.map(() => "id" + (++counter)); // convert to id
await collection.add(
  ids,
  undefined,
  json.metaList,
  json.documentList,
);

console.log('total documents',await collection.count());

package.json:

{
  "dependencies": {
    "chromadb": "^1.3.1",
    "openai": "^3.2.1"
  }
}

kokizzu avatar Mar 11 '23 20:03 kokizzu

+1

rrubio avatar Mar 18 '23 10:03 rrubio

@kokizzu @rrubio hello! I agree this error message is not helpful. This is because the default port for the Docker container running localhost:8000 is not up. I was able to reproduce this locally. Can you confirm that you are running the docker container? Currently the JS client is a client "only" and talks to a backend.

jeffchuber avatar Mar 18 '23 16:03 jeffchuber

following up here! would love to get to the bottom of this

jeffchuber avatar Mar 29 '23 20:03 jeffchuber

@jeffchuber , I weirdly got this error while trying to ingest a lot of PDF files, 10 or so. Each file is split into chunks, embedded, before being inserted using langchain.

When ingesting 2-4 PDF files at a time, it was ok.

Error:

TypeError: Cannot read properties of undefined (reading 'data')
    at <anonymous> (/<removed>/node_modules/.pnpm/[email protected]/node_modules/chromadb/dist/main/index.js:136:29)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at Collection.add (/<removed>/node_modules/.pnpm/[email protected]/node_modules/chromadb/dist/main/index.js:124:26)
    at Chroma.addVectors (/<removed>/node_modules/.pnpm/[email protected]_bte6oiujo3pczovvxy7tybwkpm/node_modules/langchain/dist/vectorstores/chroma.js:78:9)
    at Chroma.addDocuments (/<removed>/node_modules/.pnpm/[email protected]_bte6oiujo3pczovvxy7tybwkpm/node_modules/langchain/dist/vectorstores/chroma.js:39:9)
    at Function.fromDocuments (/<removed>/node_modules/.pnpm/[email protected]_bte6oiujo3pczovvxy7tybwkpm/node_modules/langchain/dist/vectorstores/chroma.js:119:9)
    at run (/<removed>/scripts/ingest-data.ts:43:5)
    at <anonymous> (/<removed>/scripts/ingest-data.ts:53:3)

Code: (follows langchain + chroma doc)

// load doc
// ..
await Chroma.fromDocuments(docs, embeddings, {
      collectionName: CHROMA_COLLECTION_NAME,
    });

Edit: My apologies. The code is caused from a different issue from the original question. Should I make a separate Issue or on the langchain repo?

Edit2: After testing, I can only addVectors of size 499. Anything higher would cause this error.

NanoCode012 avatar Mar 31 '23 09:03 NanoCode012

@NanoCode012 499 is the dimensionality of the vectors? Any anything over that breaks? That is very odd!

jeffchuber avatar Apr 02 '23 20:04 jeffchuber

@NanoCode012 499 is the dimensionality of the vectors? Any anything over that breaks? That is very odd!

Sorry for late reply @jeffchuber . No, I use openai , so the dimensionality follows the ada dimensionality 1500 something.

The 499 is the number of openai embeddings inserted at one time. I had to loop my insertions to only this amount per time. Also, I apologize if this is the wrong thread..

NanoCode012 avatar Apr 05 '23 05:04 NanoCode012

also encountered the same problem. All good when tested it locally with 10ish PDF files. However, in prod env, 100% met such problem when trying to ingest 500 PDF files.

Mind sharing how to solve it? thanks.

/home/xxx/node_modules/chromadb/dist/main/index.js:136
            return response.data;
                            ^

TypeError: Cannot read properties of undefined (reading 'data')
    at /home/xxx/node_modules/chromadb/dist/main/index.js:136:29
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async Collection.add (/home/xxx/node_modules/chromadb/dist/main/index.js:124:26)
    at async Chroma.addVectors (file:///home/xxx/node_modules/langchain/dist/vectorstores/chroma.js:78:9)
    at async Chroma.addDocuments (file:///home/xxx/node_modules/langchain/dist/vectorstores/chroma.js:39:9)
    at async Function.fromDocuments (file:///home/xxx/node_modules/langchain/dist/vectorstores/chroma.js:119:9)
    at async run (file:///home/xxx/dist/ingest/ingestdirectory.js:29:25)
const model = new OpenAI(
    {
      modelName: "gpt-3.5-turbo",
      temperature: 0,
      cache: true,
      concurrency: 5,
      verbose: true,
      openAIApiKey: process.env.OPENAI_API_KEY
    }
  );

  // const directoryPath = "PDF_DB";
  const loader = new DirectoryLoader(
    "PDF_DB/",
    {
      ".pdf": (path) => new PDFLoader(path),
    },
    true
  );
  const raw_docs = await loader.load();

  const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 200, chunkOverlap: 40 });
  const docs = await textSplitter.splitDocuments(raw_docs);

  const vectorStore = await Chroma.fromDocuments(
    docs,
    new OpenAIEmbeddings(),
    {
      'collectionName': "collection_name"
    }
  );

happybit avatar Apr 13 '23 18:04 happybit

Hello @happybit , I re-created how the langchain inserts docs but edited to only submit at most 499 embeddings per time.

Fyi: It's not 500 docs for me, but 500 embeddings.

const docs = await textSplitter.splitDocuments(rawDocs);
const embeddings = new OpenAIEmbeddings();

const texts = docs.map(({ pageContent }) => pageContent);
const embedTexts = await embeddings.embedDocuments(texts);

 const chroma = new Chroma(embeddings, {
      collectionName: CHROMA_COLLECTION_NAME,
});
    
// Add the documents to the vector store by pieces
let j = 0;
const increment = 499;
for (let i = 0; i < docs.length; i+=increment) {
      j = Math.min(i + increment, docs.length);
      await chroma.addVectors(embedTexts.slice(i, j), docs.slice(i, j));
}

NanoCode012 avatar Apr 14 '23 00:04 NanoCode012

Hello @happybit , I re-created how the langchain inserts docs but edited to only submit at most 499 embeddings per time.

Fyi: It's not 500 docs for me, but 500 embeddings.

const docs = await textSplitter.splitDocuments(rawDocs);
const embeddings = new OpenAIEmbeddings();

const texts = docs.map(({ pageContent }) => pageContent);
const embedTexts = await embeddings.embedDocuments(texts);

 const chroma = new Chroma(embeddings, {
      collectionName: CHROMA_COLLECTION_NAME,
});
    
// Add the documents to the vector store by pieces
let j = 0;
const increment = 499;
for (let i = 0; i < docs.length; i+=increment) {
      j = Math.min(i + increment, docs.length);
      await chroma.addVectors(embedTexts.slice(i, j), docs.slice(i, j));
}

I will try. Thanks a lot!

happybit avatar Apr 14 '23 02:04 happybit

@NanoCode012 great. it works like a charm. thank you so much😄

happybit avatar Apr 14 '23 07:04 happybit

I suspect this is fixed with the switch to fetch from axios

jeffchuber avatar Jun 23 '23 16:06 jeffchuber