langchain
langchain copied to clipboard
fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug
Chroma update_document full document embeddings bugfix
Chroma update_document takes a single document, but treats the page_content sting of that document as a list when getting the new document embedding.
This is a two-fold problem, where the resulting embedding for the updated document is incorrect (it's only an embedding of the first character in the new page_content) and it calls the embedding function for every character in the new page_content string, using many tokens in the process.
Fixes #5582
Before submitting
Who can review?
Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:
Tagging @dev2049 for vectorstore bugfix
thanks @cnellington!