bitsandbytes
bitsandbytes copied to clipboard
error on VectorstoreIndexCreator
System Info
run on kaggle
Reproduction
i get this error :
ValidationError Traceback (most recent call last) Cell In[3], line 33 31 index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory": "persist", "embedding": embedding_function}).from_loaders([loader]) 32 else: ---> 33 index = VectorstoreIndexCreator(vectorstore_kwargs={"embedding": embedding_function}).from_loaders([loader]) 35 chain = ConversationalRetrievalChain.from_llm( 36 llm=model, 37 retriever=index.vectorstore.as_retriever(search_kwargs={"k": 1}), 38 ) 40 chat_history = []
File /opt/conda/lib/python3.10/site-packages/pydantic/v1/main.py:341, in BaseModel.init(pydantic_self, **data) 339 values, fields_set, validation_error = validate_model(pydantic_self.class, data) 340 if validation_error: --> 341 raise validation_error 342 try: 343 object_setattr(pydantic_self, 'dict', values)
ValidationError: 1 validation error for VectorstoreIndexCreator embedding field required (type=value_error.missing)
please help me to solve this
Expected behavior
when i want use RAG, get this error
I'm getting the same error! did you manage to find a solution?
same here
Not sure what you are trying to do but in my case I was using LangChain and running this example https://python.langchain.com/docs/integrations/document_loaders/hugging_face_dataset/
It seems to work with the below changes (comments at the top of each line)
# import from langchain.indexes.vectorstore rather than langchain.indexes as in the example
from langchain.indexes.vectorstore import VectorstoreIndexCreator
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders.hugging_face_dataset import (
HuggingFaceDatasetLoader,
)
embeddings = HuggingFaceEmbeddings()
dataset_name = "tweet_eval"
page_content_column = "text"
name = "stance_climate"
loader = HuggingFaceDatasetLoader(dataset_name, page_content_column, name)
# pass the embedding as parameter, in the example is empty
index = VectorstoreIndexCreator(embedding=embeddings).from_loaders([loader])
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature="0")
query = "What are the most used hashtag?"
# looks like we need to pass an llm now
result = index.query(query, llm=llm)
I hope it helps
Confirm having this issue as well. following code at https://github.com/Ryota-Kawamura/LangChain-for-LLM-Application-Development
In addition, at Langchain docs at langchain docs it show that we can run below code, but we cannot, with error provide below
from langchain.indexes import VectorstoreIndexCreator
from langchain_community.document_loaders.hugging_face_dataset import (
HuggingFaceDatasetLoader,
)
dataset_name = "tweet_eval"
page_content_column = "text"
name = "stance_climate"
loader = HuggingFaceDatasetLoader(dataset_name, page_content_column, name)
index = VectorstoreIndexCreator().from_loaders([loader])
> error : ValidationError: 1 validation error for VectorstoreIndexCreator
embedding
field required (type=value_error.missing)
below is install packages from piplock file : also all latest
[packages]
langchain = "*"
python-dotenv = "*"
openai = "==0.28"
langchain-community = "*"
langchain-core = "*"
tiktoken = "*"
docarray = "*"
following, same error. tried a few alternatives...
i have the same error
Hello 👋 Here how I solved it:
Step by step
- It mentions that there is missing required field "embeddings"
- I went to the lanchain documentation and yes it's mentioned, see here
- To my understanding it's missing which embedding model we are using
- So I added the following lines:
Code I changed
# Making the necessary import
from langchain.embeddings import OpenAIEmbeddings
# Instantiating embeddings model
embeddings = OpenAIEmbeddings()
#Pass it as expected as compulsory param
index = VectorstoreIndexCreator(
embedding=embeddings,
vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])
Hope it helps, Cheers
Reminder of The error
ValidationError Traceback (most recent call last) Cell In[15], line 5 1 # index = VectorstoreIndexCreator( 2 # vectorstore_cls=DocArrayInMemorySearch 3 # ).from_loaders([loader]) ----> 5 index = VectorstoreIndexCreator( 6 vectorstore_cls=DocArrayInMemorySearch 7 ).from_documents([docs])
File ~/.pyenv/versions/3.10.6/envs/deep_env/lib/python3.10/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.init()
ValidationError: 1 validation error for VectorstoreIndexCreator embedding field required (type=value_error.missing)
Not sure what you are trying to do but in my case I was using LangChain and running this example https://python.langchain.com/docs/integrations/document_loaders/hugging_face_dataset/
It seems to work with the below changes (comments at the top of each line)
# import from langchain.indexes.vectorstore rather than langchain.indexes as in the example from langchain.indexes.vectorstore import VectorstoreIndexCreator from langchain_community.embeddings import HuggingFaceEmbeddings from langchain_openai import ChatOpenAI from langchain_community.document_loaders.hugging_face_dataset import ( HuggingFaceDatasetLoader, ) embeddings = HuggingFaceEmbeddings() dataset_name = "tweet_eval" page_content_column = "text" name = "stance_climate" loader = HuggingFaceDatasetLoader(dataset_name, page_content_column, name) # pass the embedding as parameter, in the example is empty index = VectorstoreIndexCreator(embedding=embeddings).from_loaders([loader]) llm = ChatOpenAI(model="gpt-3.5-turbo", temperature="0") query = "What are the most used hashtag?" # looks like we need to pass an llm now result = index.query(query, llm=llm)I hope it helps
This worked for me, thank you:)