ragas The runner thread which was running the jobs raised an exeception.

ExceptionInRunner Traceback (most recent call last) Cell In[56], line 22 11 embeddings = client.embeddings.create( 12 input=inputs, 13 model="RAG_text-embedding-3-large" 14 ) 16 generator = TestsetGenerator.from_langchain( 17 generator_llm, 18 critic_llm, 19 embeddings 20 ) ---> 22 testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

File c:\Users\10742161\AppData\Local\anaconda3\envs\rag\lib\site-packages\ragas\testset\generator.py:206, in TestsetGenerator.generate_with_langchain_docs(self, documents, test_size, distributions, with_debugging_logs, is_async, raise_exceptions, run_config) 204 distributions = distributions or {} 205 # chunk documents and add to docstore --> 206 self.docstore.add_documents( 207 [Document.from_langchain_document(doc) for doc in documents] 208 ) 210 return self.generate( 211 test_size=test_size, 212 distributions=distributions, (...) 216 run_config=run_config, 217 )

File c:\Users\10742161\AppData\Local\anaconda3\envs\rag\lib\site-packages\ragas\testset\docstore.py:214, in InMemoryDocumentStore.add_documents(self, docs, show_progress) 209 # split documents with self.splitter into smaller nodes 210 nodes = [ 211 Node.from_langchain_document(d) 212 for d in self.splitter.transform_documents(docs) 213 ] --> 214 self.add_nodes(nodes, show_progress=show_progress)

File c:\Users\10742161\AppData\Local\anaconda3\envs\rag\lib\site-packages\ragas\testset\docstore.py:253, in InMemoryDocumentStore.add_nodes(self, nodes, show_progress) 251 results = executor.results() 252 if not results: --> 253 raise ExceptionInRunner() 255 for i, n in enumerate(nodes): 256 if i in nodes_to_embed.keys():

ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exceptions=False incase you want to show only a warning message instead. Collecting ragas==0.1.9

Describe the bug While generating synthetic dataset : ExceptionInRunner error appeared as below: ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exceptions=False incase you want to show only a warning message instead.

Ragas version: 0.1.9 Python version: 3.10

Code to Reproduce

from langchain_community.document_loaders import DirectoryLoader loader = DirectoryLoader(r"Data\Rag_data") documents = loader.load()

for document in documents: document.metadata['filename'] = document.metadata['source']

from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context from langchain.llms import AzureOpenAI generator_llm = AzureOpenAI(api_key= open_api_key, api_version= azure_api_version, azure_endpoint=azure_endpoint) critic_llm = AzureOpenAI(api_key= open_api_key, api_version= azure_api_version, azure_endpoint=azure_endpoint)

embeddings = client.embeddings.create( input=inputs, model="RAG_text-embedding-3-large" )

generator = TestsetGenerator.from_langchain( generator_llm, critic_llm, embeddings )

testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

Error trace

Expected behavior Expected behavior to be generation of Synthetic/Ground Truth in Json format as provided with "filename"

Additional context Add any other context about the problem here.

Oct 03 '24 12:10 tushitdave

Hey @tushitdave! I'm here to help you out with any bugs or questions you have while waiting for a human maintainer.

A similar issue was reported in the Ragas repository, where the ExceptionInRunner was raised. The suggested solution for this issue involves ensuring that the OPENAI_API_KEY environment variable is set correctly with a valid OpenAI API key. You can set the API key using the following code snippet:

import os
from getpass import getpass
import openai

# Set your OpenAI API key if it is not already set as an environment variable
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
openai.api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key

Additionally, you can pass raise_exceptions=False in the evaluate function to show only a warning message instead of raising an exception:

from datasets import Dataset 
from ragas import evaluate
from ragas.metrics import context_recall

data_samples = {
    'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
    'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
    'contexts' : [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'], 
    ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],
    'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
}

dataset = Dataset.from_dict(data_samples)

# Ensure the API key is correct
os.environ["OPENAI_API_KEY"] = "your-valid-openai-key"

score = evaluate(dataset, metrics=[context_recall], raise_exceptions=False)
score.to_pandas()

This approach will prevent the ExceptionInRunner from stopping your execution and will only show a warning message ^[1].

To continue talking to Dosu, mention @dosu.

Oct 03 '24 12:10 dosubot[bot]

@dosu

I did the same as you suggested, but error remain same. Please note I am generating synthetic dataset and using AzureOpenAI instead of OpenAI.

from langchain_community.document_loaders import DirectoryLoader loader = DirectoryLoader(r"Data\Rag_data") documents = loader.load()

for document in documents: document.metadata['filename'] = document.metadata['source']

from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context from langchain.llms import AzureOpenAI

from getpass import getpass import openai

if not (openai_api_key := os.getenv("OPENAI_API_KEY")): openai_api_key = getpass("Used my azureopenAI key here") openai.api_key = openai_api_key os.environ["OPENAI_API_KEY"] = openai_api_key

generator_llm = AzureOpenAI(api_key= open_api_key, api_version= azure_api_version, azure_endpoint=azure_endpoint)

critic_llm = AzureOpenAI(api_key= open_api_key, api_version= azure_api_version, azure_endpoint=azure_endpoint)

embeddings = client.embeddings.create( input=inputs, model="RAG_text-embedding-3-large" )

generator = TestsetGenerator.from_langchain( generator_llm, critic_llm, embeddings )

testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},raise_exceptions=False)

In above mentioned line, I am using langchain for synthetic data generation as mentioned in below link:

https://docs.ragas.io/en/v0.1.12/getstarted/testset_generation.html#get-started-testset-generation

Only change, I am doing here is- I am using AzureOpenAI client.

Oct 04 '24 03:10 tushitdave

When using AzureOpenAI with the Ragas library, ensure that you have configured the Azure OpenAI settings correctly. Here are the steps and configurations you need to verify:

Installation: Make sure you have installed the langchain_openai package:
```
pip install langchain_openai
```
Configuration: Ensure that you have set the Azure OpenAI parameters correctly:
- base_url: The endpoint URL for your Azure OpenAI service.
- model_deployment: The deployment name for your model.
- model_name: The name of your model.
- embedding_deployment: The deployment name for your embeddings.
- embedding_name: The name of your embedding model, typically "text-embedding-ada-002".

Code Setup: Initialize the Azure OpenAI models and embeddings using the provided configuration and wrap them with LangchainLLMWrapper and LangchainEmbeddingsWrapper:

from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

azure_configs = {
    "base_url": "https://<your-endpoint>.openai.azure.com/",
    "model_deployment": "your-deployment-name",
    "model_name": "your-model-name",
    "embedding_deployment": "your-deployment-name",
    "embedding_name": "text-embedding-ada-002",
}

azure_llm = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    validate_base_url=False,
)

azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["embedding_deployment"],
    model=azure_configs["embedding_name"],
)

azure_llm = LangchainLLMWrapper(azure_llm)
azure_embeddings = LangchainEmbeddingsWrapper(azure_embeddings)

Ensure that all these configurations are correctly set up in your code. If the error persists, double-check the Azure OpenAI service configurations and ensure that the API keys and endpoints are correctly configured ^[1].

To continue talking to Dosu, mention @dosu.

Oct 04 '24 03:10 dosubot[bot]

@Dosu As You suggested, I did applied, however, errors are as follows: ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exceptions=False incase you want to show only a warning message instead.

Code: from langchain_openai.chat_models import AzureChatOpenAI from langchain_openai.embeddings import AzureOpenAIEmbeddings from ragas.llms import LangchainLLMWrapper from ragas.embeddings import LangchainEmbeddingsWrapper from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context

embedding_deployment = "RAG_text-embedding-3-large" azure_llm = AzureChatOpenAI( openai_api_version=azure_api_version, azure_endpoint=azure_endpoint, azure_deployment=deployment, model='gpt-4o-mini', validate_base_url=False, )

critic_azure_llm = AzureChatOpenAI( openai_api_version=azure_api_version, azure_endpoint=azure_endpoint, azure_deployment=deployment, model='gpt-4o-mini', validate_base_url=False, )

azure_embeddings = AzureOpenAIEmbeddings( openai_api_version=embedding_api_version, azure_endpoint=embedding_endpoint, azure_deployment=embedding_deployment, model="text-embedding-3-large", )

azure_llm = LangchainLLMWrapper(azure_llm) critic_azure_llm = LangchainLLMWrapper(critic_azure_llm) azure_embeddings = LangchainEmbeddingsWrapper(azure_embeddings)

from langchain_community.document_loaders import DirectoryLoader loader = DirectoryLoader(r"Data\Rag_data") documents = loader.load()

for document in documents: document.metadata['filename'] = document.metadata['source']

`generator = TestsetGenerator.from_langchain( azure_llm, critic_azure_llm, azure_embeddings )

testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}, raise_exceptions=False)`

` I am not able to generate synthetic dataset.

Oct 04 '24 04:10 tushitdave

The ExceptionInRunner error you're encountering indicates that an exception occurred in the executor thread. Here are a few steps to help troubleshoot and resolve the issue:

Check Azure Configuration: Ensure that all Azure configurations are correct, including azure_endpoint, azure_deployment, and model names. Double-check that these match your Azure OpenAI service settings.
Validate API Versions: Make sure the openai_api_version and embedding_api_version are set correctly. They should match the versions supported by your Azure OpenAI deployment.
Model and Embedding Names: Verify that the model names (gpt-4o-mini and text-embedding-3-large) are correctly specified and available in your Azure deployment.
Debugging Logs: Since you have set raise_exceptions=False, the error might not stop execution, but it will still log warnings. Check the logs for any additional error messages or warnings that might provide more context.
Environment Variables: Ensure that your environment variables, especially OPENAI_API_KEY, are set correctly and accessible in your script.
Network and Permissions: Check if there are any network issues or permission settings that might be preventing access to the Azure OpenAI service.

If these steps do not resolve the issue, consider enabling debugging logs by setting with_debugging_logs=True in the generate_with_langchain_docs method to get more detailed output, which might help identify the root cause of the problem.

To continue talking to Dosu, mention @dosu.

Oct 04 '24 04:10 dosubot[bot]

@dosu : All above mentioned details/step from 1 to 6 are complied with the requirement. Nothing wrong

Oct 04 '24 04:10 tushitdave

@tushitdave Same issue here. I've run into the same issue multiple times with Ragas. It seems to be a common problem—I've seen several open tickets on this and no easy solution. I believe the root cause might be incompatible dependencies,

The main challenge is the lack of clear documentation or locked dependency versions. Without locked versions, it's tough to resolve issues with Langchain/Ragas dependencies. Unfortunately, neither the docs nor the repo provide the correct versions for reference.

For example, the pyproject.toml file here link doesn't specify the versions for most dependencies.

it's impossible to generate synthetic datasets. You need some luck to build the correct depenencies. which I haven't had lately.

Oct 08 '24 01:10 arianpasquali

This is a tough nut to crack, could really use your help on this, @jjmachan.

To continue talking to Dosu, mention @dosu.

Oct 08 '24 01:10 dosubot[bot]

This issue is related to RAGAS v0.1.9, which is now significantly outdated. The testset generation API has been completely redesigned in version 0.3.8+.

Please upgrade and follow docs for latest usage.

Nov 03 '25 11:11 anistark