langchain
langchain copied to clipboard
IndexError: list index out of range when use Chroma.from_documents
System Info
Lang Chain 0.0.186 Mac OS Ventura Python 3.10
Who can help?
No response
Information
- [ ] The official example notebooks/scripts
- [X] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [X] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
why i got IndexError: list index out of range when use Chroma.from_documents
import os
from langchain.document_loaders import BiliBiliLoader from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import Chroma from langchain.text_splitter import RecursiveCharacterTextSplitter
os.environ["OPENAI_API_KEY"] = "***"
loader = BiliBiliLoader(["https://www.bilibili.com/video/BV18o4y137n1/"])
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=20 )
documents = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(documents, embeddings, persist_directory="./db") db.persist()
Traceback (most recent call last):
File "/bilibili/bilibili_embeddings.py", line 28, in
Expected behavior
index gen succefully in the persist_directory
Same error with loader = YoutubeLoader.from_youtube_url('https://www.youtube.com/watch?v=6qB1pYwIAlw')
I'm having issues with the BiliBiliLoader when calling loader.load()
RuntimeError: This event loop is already running
Having the same issue
I had the same issue and I noticed that I had not named my source directory consistently. I don't see where you specify the source directory, but that might be the issue.
same problem with me, I set path and everything.
I got this error when the length of the documents was 0
Try checking the contents of documents before loading into Chroma
I got this error when the length of the documents was 0
Try checking the contents of documents before loading into Chroma
I get this error but I do not have the documents list empty. I was wondering if it is mandatory to have metadata for each document. For my use I do not currently need document metadata so I just ignore it.
Try using embedding instead of embeddings (notice the s
at the end). Example:
Chroma.from_documents(documents=texts, embedding=embedding_function, persist_directory=persist_directory)
Hi, @fraywang,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. It looks like you encountered an "IndexError: list index out of range" when using Chroma.from_documents in the Lang Chain library. There were several suggestions and code snippets provided by other users to troubleshoot the issue, but it seems that the problem remains unresolved.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and cooperation. If you have any further questions or updates, feel free to reach out.
I am also getting the same error