langchain IndexError: list index out of range when use Chroma.from

System Info

Lang Chain 0.0.186 Mac OS Ventura Python 3.10

Who can help?

No response

Information

[ ] The official example notebooks/scripts
[X] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[X] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

why i got IndexError: list index out of range when use Chroma.from_documents

import os

from langchain.document_loaders import BiliBiliLoader from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import Chroma from langchain.text_splitter import RecursiveCharacterTextSplitter

os.environ["OPENAI_API_KEY"] = "***"

loader = BiliBiliLoader(["https://www.bilibili.com/video/BV18o4y137n1/"])

documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=20 )

documents = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

db = Chroma.from_documents(documents, embeddings, persist_directory="./db") db.persist()

Traceback (most recent call last): File "/bilibili/bilibili_embeddings.py", line 28, in db = Chroma.from_documents(documents, embeddings, persist_directory="./db") File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 422, in from_documents return cls.from_texts( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 390, in from_texts chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 160, in add_texts self._collection.add( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 103, in add ids, embeddings, metadatas, documents = self._validate_embedding_set( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 354, in _validate_embedding_set ids = validate_ids(maybe_cast_one_to_many(ids)) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/chromadb/api/types.py", line 82, in maybe_cast_one_to_many if isinstance(target[0], (int, float)): IndexError: list index out of range

Expected behavior

index gen succefully in the persist_directory

May 31 '23 02:05 fraywang

Same error with loader = YoutubeLoader.from_youtube_url('https://www.youtube.com/watch?v=6qB1pYwIAlw')

May 31 '23 03:05 fraywang

I'm having issues with the BiliBiliLoader when calling loader.load()

RuntimeError: This event loop is already running

Jun 01 '23 04:06 hanifaudah

Having the same issue

Jun 02 '23 02:06 iha2

I had the same issue and I noticed that I had not named my source directory consistently. I don't see where you specify the source directory, but that might be the issue.

Jun 02 '23 16:06 inputcoffee

same problem with me, I set path and everything.

Jun 10 '23 03:06 Wamy-Dev

I got this error when the length of the documents was 0

Try checking the contents of documents before loading into Chroma

Jun 30 '23 10:06 acmoles

I got this error when the length of the documents was 0

Try checking the contents of documents before loading into Chroma

I get this error but I do not have the documents list empty. I was wondering if it is mandatory to have metadata for each document. For my use I do not currently need document metadata so I just ignore it.

Jul 17 '23 08:07 mateiAvram

Try using embedding instead of embeddings (notice the s at the end). Example:

Chroma.from_documents(documents=texts, embedding=embedding_function, persist_directory=persist_directory)

Sep 07 '23 16:09 Ahmad-Bunni

Hi, @fraywang,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. It looks like you encountered an "IndexError: list index out of range" when using Chroma.from_documents in the Lang Chain library. There were several suggestions and code snippets provided by other users to troubleshoot the issue, but it seems that the problem remains unresolved.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation. If you have any further questions or updates, feel free to reach out.

Dec 07 '23 16:12 dosubot[bot]

I am also getting the same error

Jul 27 '24 17:07 Govindhkiruthi

langchain langchain copied to clipboard

IndexError: list index out of range when use Chroma.from_documents

System Info

Who can help?

Information

Related Components

Reproduction

Expected behavior

langchain
langchain copied to clipboard