About Exception raised in Job[]: TimeoutError()
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug When I use the evaluation function, the timeoutError is frequently displayed, and the final evaluation metrics all show NAN, I refer to the documentation to set up various runconfigs, but it has no effect
Ragas version:0.2.9 Python version:3.12
Code to Reproduce faithfulness.llm = qwen_llm answer_relevancy.llm = qwen_llm answer_relevancy.embeddings = bge_m3 context_recall.llm = qwen_llm context_precision.llm = qwen_llm
evalsets = load_test_data(uploaded_file)
if evalsets is not None:
batch_size = 4
final_df = pd.DataFrame()
try:
result = evaluate(
evalsets,
metrics=metrics,
batch_size=batch_size,
raise_exceptions=True,
callbacks=[TestCallback()],
run_config=RunConfig(timeout=60,max_retries=3, max_wait=60, max_workers=1)
)
final_df = result.to_pandas()
print("final_df:", final_df.head())
except Exception as e:
print(f"评估时出错: {e}")
Error trace
Evaluating: 0%| | 0/8 [00:00<?, ?it/s]Exception raised in Job[0]: TimeoutError()
Evaluating: 25%|████████████████████▊ | 2/8 [02:03<06:10, 61.83s/it]Exception raised in Job[2]: TimeoutError()
Evaluating: 38%|███████████████████████████████▏ | 3/8 [03:03<05:04, 61.00s/it]Exception raised in Job[3]: TimeoutError()
Evaluating: 50%|█████████████████████████████████████████▌ | 4/8 [04:03<04:02, 60.61s/it]Exception raised in Job[4]: TimeoutError()
Evaluating: 62%|███████████████████████████████████████████████████▉ | 5/8 [05:03<03:01, 60.39s/it]Exception raised in Job[5]: TimeoutError()
Evaluating: 75%|██████████████████████████████████████████████████████████████▎ | 6/8 [06:03<02:00, 60.26s/it]Exception raised in Job[6]: TimeoutError()
Evaluating: 88%|████████████████████████████████████████████████████████████████████████▋ | 7/8 [07:03<01:00, 60.18s/it]Exception raised in Job[7]: TimeoutError()
Evaluating: 100%|███████████████████████████████████████████████████████████████████████████████████| 8/8 [08:03<00:00, 60.39s/it]
final_df: user_input retrieved_contexts ... context_precision context_recall
0 请详细介绍奥运会的历史与发展。 [奥运会,全称为奥林匹克运动会(Olympic Games),是世界上规模最大、影响最广的综... ...
NaN NaN
1 请详细介绍奥运会的历史与发展。 [奥运会,全称为奥林匹克运动会(Olympic Games),是世界上规模最大、影响最广的综... ...
NaN NaN
Expected behavior This issue didn't come up at first, and I wondered if it was due to server load or other factors such as the network, or if there was a problem with the evaluation parameters I had configured. Thanks.
Additional context Add any other context about the problem here.
Any updates on this, facing similar issue. Not sure how to proceed.
I have following code
`
os.environ["OPENAI_API_KEY"] = current_config.get('openai_api_key')
os.environ["RAGAS_APP_TOKEN"] = current_config.get('ragas_api_token')
with open("dataset.json", "r", encoding="utf-8") as file:
dataset = json.load(file)
datasetWithResponse = []
import requests
if "dataset" in dataset and isinstance(dataset["dataset"], list):
for item in dataset["dataset"]:
question = item.get("user_input")
if question:
payload = {
"message": {"role": "user", "content": question},
"metadata_fields": "retrieved_docs",
}
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json'
}
print(f"Asking {question}")
response = requests.post(api_url, headers = headers, json = payload)
if response.status_code == 200:
response_data = response.json()
body = json.loads(response_data["body"])
retrieved_docs = next((msg["metadata"]["retrieved_docs"] for msg in body["messages"] if msg["role"] == "ai"), None)
retrieved_contexts = [doc["text"] if isinstance(doc, dict) and "text" in doc else "" for doc in retrieved_docs] if retrieved_docs else []
datasetWithResponse.append({
"user_input": question,
"retrieved_contexts": retrieved_contexts,
"reference": item.get("reference")
})
else:
print(f"Error for question: {question}, Status Code: {response.status_code}")
else:
print("Invalid dataset format.")
evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o", request_timeout=120))
metric = LLMContextPrecisionWithReference(llm=evaluator_llm)
evaluation_dataset = EvaluationDataset.from_list(datasetWithResponse)
evalResults = evaluate(evaluation_dataset, metrics=[metric], batch_size=1)
print(evalResults)
evalResults.upload()
`
Just in case someone else still facing this issue:
This worked for me:
from ragas.run_config import RunConfig
...
results = evaluate(dataset=ragas_dataset, metrics=metrics, run_config=RunConfig(max_workers=1))
...
default for max_workers = 16, reducing it made evaluation slower, but I stoped to have the problem with timeout.
https://docs.ragas.io/en/latest/references/run_config/
Just in case someone else still facing this issue:
This worked for me:
from ragas.run_config import RunConfig
...
results = evaluate(dataset=ragas_dataset, metrics=metrics, run_config=RunConfig(max_workers=1))
...
default for max_workers = 16, reducing it made evaluation slower, but I stoped to have the problem with timeout.
https://docs.ragas.io/en/latest/references/run_config/
i tried this,but didnt work, still timeout in job[2] (contest_precision)
I would keep playing around with the options for run_config. max_workers solved my problem... but you might want to try timeout, max_wait or a combination of them.