langchain
langchain copied to clipboard
False "Index not found" messages
System Info
0.1173
Who can help?
No response
Information
- [ ] The official example notebooks/scripts
- [X] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [X] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
1: Create Chroma vectorstore 2: Persist vectorstore 3: Use vectorstore once 4: Vectorstore no longer works, says "Index not found"
Expected behavior
It works.
between step 3 and 4 are you ending one process and starting a new one or is this two sequential calls to the vectorstore
New process. What's messed up is it's a new process between 2 and 3 too lol, the vectorstore exists on the disk but will not load.
hm i was able to reproduce, and could only fix by specifying anonymized_telemetry=False
in client settings (inspire by https://github.com/hwchase17/langchain/issues/2491#issuecomment-1499082189 from @sergerdn)
import chromadb
db = Chroma.from_documents(docs, embeddings, persist_directory=".chroma_db")
db.persist()
client_settings = chromadb.config.Settings(
chroma_db_impl="duckdb+parquet",
persist_directory=".chroma_db",
anonymized_telemetry=False,
)
load_db = Chroma(embedding_function=embeddings, client_settings=client_settings, persist_directory=".chroma_db")
@atroyn is this expected behavior? if so we can add docs on how to properly load persisted db. could also probs add a load
class method to Chroma for convenience that handles any non-obvious configuration
related to #2490, #2491, #3011
oh i take it back, this actually does work for me
db = Chroma.from_documents(docs, embeddings, persist_directory=".chroma_db")
db.persist()
load_db = Chroma(embedding_function=embeddings, persist_directory=".chroma_db")
load_db.similarity_search_with_score("foo bar")
@francisjervis could you share a snippet that i can use to reproduce?
I cannot now even get it to run the first time. Step 1: create index
from langchain.vectorstores.chroma import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
persist_directory = 'chroma_ca_sources'
loader = DirectoryLoader('./catenantrightssources', glob="**/*.txt", loader_cls=TextLoader, show_progress=True)
docs = loader.load()
text_splitter = CharacterTextSplitter(
separator = "\n\n",
chunk_size = 500,
chunk_overlap = 100,
length_function = len,
)
split = text_splitter.split_documents(docs)
for s in split:
print(s)
embedding = OpenAIEmbeddings(openai_api_key="sk-.........", model="text-embedding-ada-002")
vectordb = Chroma.from_documents(documents=split, embedding=embedding, persist_directory=persist_directory)
vectordb.persist()
Step 2: query
from langchain.vectorstores.chroma import Chroma
from langchain.embeddings import OpenAIEmbeddings
persist_directory = 'chroma_ca_sources'
embedding = OpenAIEmbeddings(openai_api_key="sk-...", model="text-embedding-ada-002")
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
retriever = vectordb.as_retriever()
query = "what is a lease?"
result = retriever.get_relevant_documents(query=query)
print(result)
The second script fails with raise NoIndexException("Index not found, please create an instance before querying")
No, this is not a path error - it fails with absolute paths too.
For what it's worth, I was seeing the same message (Index not found) even with an empty ChromaDB. I upgraded chromadb from 0.3.23 to 0.3.25 and that fixed the error for me. This commit is probably related.
Hi, @francisjervis. I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue is about false "Index not found" messages when using a vector store in the OpenAI library. You provided steps to reproduce the issue and expected the vector store to work without any errors. In the comments, there was a discussion about the steps to reproduce the issue and potential fixes, such as specifying anonymized_telemetry=False
in client settings or upgrading chromadb from 0.3.23 to 0.3.25.
Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain repository. Let us know if you have any further questions or concerns.