ragas About Exception raised in Job[]: TimeoutError()

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug When I use the evaluation function, the timeoutError is frequently displayed, and the final evaluation metrics all show NAN, I refer to the documentation to set up various runconfigs, but it has no effect

Ragas version:0.2.9 Python version:3.12

Code to Reproduce faithfulness.llm = qwen_llm answer_relevancy.llm = qwen_llm answer_relevancy.embeddings = bge_m3 context_recall.llm = qwen_llm context_precision.llm = qwen_llm

evalsets = load_test_data(uploaded_file)
if evalsets is not None:
    batch_size = 4
    final_df = pd.DataFrame()
    try:
        result = evaluate(
            evalsets,
            metrics=metrics,
            batch_size=batch_size,
            raise_exceptions=True,
            callbacks=[TestCallback()],
            run_config=RunConfig(timeout=60,max_retries=3, max_wait=60, max_workers=1)
        )
        final_df = result.to_pandas()
        print("final_df:", final_df.head())
    except Exception as e:
        print(f"评估时出错: {e}")

Error trace Evaluating: 0%| | 0/8 [00:00<?, ?it/s]Exception raised in Job[0]: TimeoutError() Evaluating: 25%|████████████████████▊ | 2/8 [02:03<06:10, 61.83s/it]Exception raised in Job[2]: TimeoutError() Evaluating: 38%|███████████████████████████████▏ | 3/8 [03:03<05:04, 61.00s/it]Exception raised in Job[3]: TimeoutError() Evaluating: 50%|█████████████████████████████████████████▌ | 4/8 [04:03<04:02, 60.61s/it]Exception raised in Job[4]: TimeoutError() Evaluating: 62%|███████████████████████████████████████████████████▉ | 5/8 [05:03<03:01, 60.39s/it]Exception raised in Job[5]: TimeoutError() Evaluating: 75%|██████████████████████████████████████████████████████████████▎ | 6/8 [06:03<02:00, 60.26s/it]Exception raised in Job[6]: TimeoutError() Evaluating: 88%|████████████████████████████████████████████████████████████████████████▋ | 7/8 [07:03<01:00, 60.18s/it]Exception raised in Job[7]: TimeoutError() Evaluating: 100%|███████████████████████████████████████████████████████████████████████████████████| 8/8 [08:03<00:00, 60.39s/it] final_df: user_input retrieved_contexts ... context_precision context_recall 0 请详细介绍奥运会的历史与发展。 [奥运会，全称为奥林匹克运动会（Olympic Games），是世界上规模最大、影响最广的综... ...
NaN NaN 1 请详细介绍奥运会的历史与发展。 [奥运会，全称为奥林匹克运动会（Olympic Games），是世界上规模最大、影响最广的综... ...
NaN NaN

Expected behavior This issue didn't come up at first, and I wondered if it was due to server load or other factors such as the network, or if there was a problem with the evaluation parameters I had configured. Thanks.

Additional context Add any other context about the problem here.

Jan 16 '25 08:01 erliang-sf

Any updates on this, facing similar issue. Not sure how to proceed.

I have following code

`

os.environ["OPENAI_API_KEY"] = current_config.get('openai_api_key')
os.environ["RAGAS_APP_TOKEN"] = current_config.get('ragas_api_token')

with open("dataset.json", "r", encoding="utf-8") as file:
    dataset = json.load(file)


datasetWithResponse = []

import requests


if "dataset" in dataset and isinstance(dataset["dataset"], list):
    for item in dataset["dataset"]:
        question = item.get("user_input")
        if question:
            payload = {
                "message": {"role": "user", "content": question},
                "metadata_fields": "retrieved_docs",
            }
            headers = {
                'Accept': 'application/json',
                'Content-Type': 'application/json'
            }
        
            print(f"Asking {question}")
            response = requests.post(api_url, headers = headers, json = payload)
        
            if response.status_code == 200:
                response_data = response.json()
                body = json.loads(response_data["body"])

            
                retrieved_docs = next((msg["metadata"]["retrieved_docs"] for msg in body["messages"] if msg["role"] == "ai"), None)
            
                retrieved_contexts = [doc["text"] if isinstance(doc, dict) and "text" in doc else "" for doc in retrieved_docs] if retrieved_docs else []
                datasetWithResponse.append({
                    "user_input": question,
                    "retrieved_contexts": retrieved_contexts,
                    "reference": item.get("reference")
                })
            else:
                print(f"Error for question: {question}, Status Code: {response.status_code}")
else:
    print("Invalid dataset format.")

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o", request_timeout=120))
metric =  LLMContextPrecisionWithReference(llm=evaluator_llm)
evaluation_dataset = EvaluationDataset.from_list(datasetWithResponse)
evalResults = evaluate(evaluation_dataset, metrics=[metric], batch_size=1)
print(evalResults)
evalResults.upload()

`

Mar 19 '25 01:03 kiranP007

Just in case someone else still facing this issue:

This worked for me:

from ragas.run_config import RunConfig

...

results = evaluate(dataset=ragas_dataset, metrics=metrics, run_config=RunConfig(max_workers=1))

...

default for max_workers = 16, reducing it made evaluation slower, but I stoped to have the problem with timeout.

https://docs.ragas.io/en/latest/references/run_config/

Jun 08 '25 22:06 gliechtenstein-ai

Just in case someone else still facing this issue:

This worked for me:

from ragas.run_config import RunConfig

...

results = evaluate(dataset=ragas_dataset, metrics=metrics, run_config=RunConfig(max_workers=1))

...

default for max_workers = 16, reducing it made evaluation slower, but I stoped to have the problem with timeout.

https://docs.ragas.io/en/latest/references/run_config/

i tried this,but didnt work, still timeout in job[2] (contest_precision)

Jul 11 '25 01:07 zxc11086

I would keep playing around with the options for run_config. max_workers solved my problem... but you might want to try timeout, max_wait or a combination of them.

Jul 18 '25 11:07 gliechtenstein-ai