bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

error on VectorstoreIndexCreator

Open MohammadAminDHM opened this issue 1 year ago • 9 comments
trafficstars

System Info

run on kaggle

Reproduction

i get this error :


ValidationError Traceback (most recent call last) Cell In[3], line 33 31 index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory": "persist", "embedding": embedding_function}).from_loaders([loader]) 32 else: ---> 33 index = VectorstoreIndexCreator(vectorstore_kwargs={"embedding": embedding_function}).from_loaders([loader]) 35 chain = ConversationalRetrievalChain.from_llm( 36 llm=model, 37 retriever=index.vectorstore.as_retriever(search_kwargs={"k": 1}), 38 ) 40 chat_history = []

File /opt/conda/lib/python3.10/site-packages/pydantic/v1/main.py:341, in BaseModel.init(pydantic_self, **data) 339 values, fields_set, validation_error = validate_model(pydantic_self.class, data) 340 if validation_error: --> 341 raise validation_error 342 try: 343 object_setattr(pydantic_self, 'dict', values)

ValidationError: 1 validation error for VectorstoreIndexCreator embedding field required (type=value_error.missing)

please help me to solve this

Expected behavior

when i want use RAG, get this error

MohammadAminDHM avatar May 06 '24 12:05 MohammadAminDHM

I'm getting the same error! did you manage to find a solution?

jstoppa avatar May 06 '24 19:05 jstoppa

same here

chanyanhon avatar May 07 '24 03:05 chanyanhon

Hi all,

This seems to be an issue for LangChain, and not bitsandbytes.

matthewdouglas avatar May 07 '24 13:05 matthewdouglas

Not sure what you are trying to do but in my case I was using LangChain and running this example https://python.langchain.com/docs/integrations/document_loaders/hugging_face_dataset/

It seems to work with the below changes (comments at the top of each line)

# import from langchain.indexes.vectorstore rather than langchain.indexes as in the example
from langchain.indexes.vectorstore import VectorstoreIndexCreator 
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI

from langchain_community.document_loaders.hugging_face_dataset import (
    HuggingFaceDatasetLoader,
)

embeddings = HuggingFaceEmbeddings()

dataset_name = "tweet_eval"
page_content_column = "text"
name = "stance_climate"

loader = HuggingFaceDatasetLoader(dataset_name, page_content_column, name)

# pass the embedding as parameter, in the example is empty
index = VectorstoreIndexCreator(embedding=embeddings).from_loaders([loader])

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature="0")

query = "What are the most used hashtag?"

# looks like we need to pass an llm now
result = index.query(query, llm=llm)

I hope it helps

jstoppa avatar May 07 '24 14:05 jstoppa

Confirm having this issue as well. following code at https://github.com/Ryota-Kawamura/LangChain-for-LLM-Application-Development

In addition, at Langchain docs at langchain docs it show that we can run below code, but we cannot, with error provide below

from langchain.indexes import VectorstoreIndexCreator
from langchain_community.document_loaders.hugging_face_dataset import (
    HuggingFaceDatasetLoader,
)
dataset_name = "tweet_eval"
page_content_column = "text"
name = "stance_climate"

loader = HuggingFaceDatasetLoader(dataset_name, page_content_column, name)
index = VectorstoreIndexCreator().from_loaders([loader])
> error : ValidationError: 1 validation error for VectorstoreIndexCreator
embedding
  field required (type=value_error.missing)

below is install packages from piplock file : also all latest

[packages]
langchain = "*"
python-dotenv = "*"
openai = "==0.28"
langchain-community = "*"
langchain-core = "*"
tiktoken = "*"
docarray = "*"

jitvimol avatar May 23 '24 10:05 jitvimol

following, same error. tried a few alternatives...

dcsan avatar May 25 '24 02:05 dcsan

i have the same error

weilingfeng98 avatar Jun 23 '24 13:06 weilingfeng98

Hello 👋 Here how I solved it:

Step by step

  1. It mentions that there is missing required field "embeddings"
  2. I went to the lanchain documentation and yes it's mentioned, see here
  3. To my understanding it's missing which embedding model we are using
  4. So I added the following lines:

Code I changed

# Making the necessary import 
from langchain.embeddings import OpenAIEmbeddings

# Instantiating embeddings model 
embeddings = OpenAIEmbeddings()

#Pass it as expected as compulsory param
index = VectorstoreIndexCreator(
    embedding=embeddings,
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

Hope it helps, Cheers

Reminder of The error

ValidationError Traceback (most recent call last) Cell In[15], line 5 1 # index = VectorstoreIndexCreator( 2 # vectorstore_cls=DocArrayInMemorySearch 3 # ).from_loaders([loader]) ----> 5 index = VectorstoreIndexCreator( 6 vectorstore_cls=DocArrayInMemorySearch 7 ).from_documents([docs])

File ~/.pyenv/versions/3.10.6/envs/deep_env/lib/python3.10/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.init()

ValidationError: 1 validation error for VectorstoreIndexCreator embedding field required (type=value_error.missing)

syksteaz avatar Jul 04 '24 09:07 syksteaz

Not sure what you are trying to do but in my case I was using LangChain and running this example https://python.langchain.com/docs/integrations/document_loaders/hugging_face_dataset/

It seems to work with the below changes (comments at the top of each line)

# import from langchain.indexes.vectorstore rather than langchain.indexes as in the example
from langchain.indexes.vectorstore import VectorstoreIndexCreator 
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI

from langchain_community.document_loaders.hugging_face_dataset import (
    HuggingFaceDatasetLoader,
)

embeddings = HuggingFaceEmbeddings()

dataset_name = "tweet_eval"
page_content_column = "text"
name = "stance_climate"

loader = HuggingFaceDatasetLoader(dataset_name, page_content_column, name)

# pass the embedding as parameter, in the example is empty
index = VectorstoreIndexCreator(embedding=embeddings).from_loaders([loader])

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature="0")

query = "What are the most used hashtag?"

# looks like we need to pass an llm now
result = index.query(query, llm=llm)

I hope it helps

This worked for me, thank you:)

Satvik-jain avatar Jul 25 '24 21:07 Satvik-jain