langchain icon indicating copy to clipboard operation
langchain copied to clipboard

fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug

Open cnellington opened this issue 2 years ago • 1 comments

Chroma update_document full document embeddings bugfix

Chroma update_document takes a single document, but treats the page_content sting of that document as a list when getting the new document embedding.

This is a two-fold problem, where the resulting embedding for the updated document is incorrect (it's only an embedding of the first character in the new page_content) and it calls the embedding function for every character in the new page_content string, using many tokens in the process.

Fixes #5582

Before submitting

Who can review?

Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:

Tagging @dev2049 for vectorstore bugfix

cnellington avatar Jun 01 '23 23:06 cnellington

thanks @cnellington!

dev2049 avatar Jun 02 '23 01:06 dev2049