ragas icon indicating copy to clipboard operation
ragas copied to clipboard

ValidationError with OpenAIEmbeddings in Ragas

Open SushmitaSingh96 opened this issue 1 year ago • 2 comments

A ValidationError occurs when trying to use the evaluate function from Ragas with the OpenAIEmbeddings model. The error message suggests using AzureOpenAIEmbeddings when using Azure, but I am using the standard OpenAI API.

To Reproduce

Steps to reproduce the behavior:

  1. Use the following code from the Ragas documentation: Metrics - Answer Correctness
  2. Run the code, and observe the error.
from datasets import Dataset 
from ragas.metrics import answer_correctness
from ragas import evaluate

data_samples = {
    'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
    'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
    'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
}
dataset = Dataset.from_dict(data_samples)
score = evaluate(dataset, metrics=[answer_correctness])
score.to_pandas()

Error Message

  181 def embedding_factory(
  182     model: str = "text-embedding-ada-002", run_config: t.Optional[RunConfig] = None
  183 ) -> BaseRagasEmbeddings:
--> 184     openai_embeddings = OpenAIEmbeddings(model=model)
  185     if run_config is not None:
...
  343     object_setattr(__pydantic_self__, '__dict__', values)

ValidationError: 1 validation error for OpenAIEmbeddings
__root__
  If you are using Azure, please use the `AzureOpenAIEmbeddings` class. (type=value_error)

Expected behavior

The code should execute without errors, generating the evaluation scores and converting them to a pandas DataFrame.

Suggested Workaround

The issue seems to be related to the LangChain framework, and a suggested workaround can be found here.

The code change is:

embeddings = OpenAIEmbeddings(openai_api_type="openai")

Environment:

  • OS: MacOS
  • Python version: 3.11.9
  • Ragas version: 0.1.10
  • langchain_openai version: 0.1.7
  • openai version: 1.36.1

SushmitaSingh96 avatar Jul 29 '24 09:07 SushmitaSingh96

Hey, @SushmitaSingh96! I'm here to help you with bugs, questions, and more while we wait for a human maintainer. Let's tackle this together!

To resolve the ValidationError when using the evaluate function from Ragas with the OpenAIEmbeddings model, ensure that you are passing the correct instances of the embeddings and LLM models. Since you are using the standard OpenAI API and not Azure, you should wrap your OpenAI embeddings and LLM instances with the appropriate wrappers provided by Ragas.

Here is an example of how to do this:

from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.metrics import answer_correctness
from ragas import evaluate
from datasets import Dataset

# Initialize OpenAI embeddings and LLM
openai_embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
openai_llm = OpenAI(model="text-davinci-003")

# Wrap them with Ragas wrappers
wrapped_embeddings = LangchainEmbeddingsWrapper(openai_embeddings)
wrapped_llm = LangchainLLMWrapper(openai_llm)

# Prepare your data
data_samples = {
    'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
    'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
    'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
}
dataset = Dataset.from_dict(data_samples)

# Perform evaluation
score = evaluate(dataset, metrics=[answer_correctness], llm=wrapped_llm, embeddings=wrapped_embeddings)
print(score.to_pandas())

This code ensures that the OpenAI embeddings and LLM are correctly wrapped and passed to the evaluate function, which should resolve the ValidationError [1][2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Jul 29 '24 10:07 dosubot[bot]

thanks for reporting this @SushmitaSingh96, seems like the fix you suggested should help but not sure why I don't see it in mine

have you set OPENAI_API_TYPE in your env ?

jjmachan avatar Jul 30 '24 05:07 jjmachan

I encountered the same problem and tried the same code from dosubot. Even after running the exact code produced by it, I keep getting this error. Is this because of some discrepancy from my side?

pydantic_core._pydantic_core.ValidationError: 1 validation error for OpenAIEmbeddings openai_api_key Extra inputs are not permitted [type=extra_forbidden, input_value='no-key', input_type=str] For further information visit https://errors.pydantic.dev/2.10/v/extra_forbidden

Nandhini25S avatar Jan 08 '25 10:01 Nandhini25S

hey @Nandhini25S that is strange but it seems to be coming from this line

openai_embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

ie when initing the embedding model?

jjmachan avatar Jan 08 '25 14:01 jjmachan

@jjmachan Yes it looks like the error indeed comes from that line. I've referred to the documentation and tried to upgrade all the packages to latest version. No use still T_T

Nandhini25S avatar Jan 08 '25 15:01 Nandhini25S