ragas Evaluation task blocked

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug I use ragas to evaluate my RAG system. For now, I have 100 question/context/answers to evaluate. However, when the tasks arrives at the end, it gets blocked and it does not finish the job. I thought it was related to a threshold in the number of requests made, thus I tried to chunk my dataset to 25 data points. And I have the same blocage :

Ragas version: ragas = "^0.1.3" Python version: 3.9

Code to Reproduce

    azure_model = AzureChatOpenAI(
        api_key=azure_openai_api_key, 
        azure_endpoint=azure_openai_endpoint,
        openai_api_version="2023-03-15-preview",
        azure_deployment=azure_configs["model_deployment"],
        model=azure_configs["model_name"],
    )

    # init the embeddings for answer_relevancy, answer_correctness and answer_similarity
    azure_embeddings = AzureOpenAIEmbeddings(
        api_key=azure_openai_api_key, 
        openai_api_version="2023-03-15-preview",
        azure_endpoint=azure_configs["base_url"],
        azure_deployment=azure_configs["embedding_deployment"],
        model=azure_configs["embedding_name"],
    )

    # list of metrics we're going to use
    metrics = [
        faithfulness,
        answer_relevancy,
        context_recall,
        context_precision,
        harmfulness,
    ]
    
    #dataset = prepare_evaluation_dataset(raw_df)
    clean_df= prepare_evaluation_dataframe(raw_df)
    
    chunks = split_dataframe(clean_df, 25)
    #global_results = pd.DataFrame()
    chunk_index = 1
    for chunk in chunks:
        print(f" [-] Getting evaluation metrics for {chunk.index}")
        try:
            chunk_dataset = Dataset.from_pandas(chunk)
            chunk_result = evaluate(chunk_dataset, metrics=metrics, llm=azure_model, embeddings=azure_embeddings)
            df_chunk_result = chunk_result.to_pandas()
            csv_output_filename = "intermediaire_metrics_clean_"+ f"evaluation_dataset_{chunk_index}.csv"
            df_chunk_result.to_csv(evaluation_path + csv_output_filename, sep=";", index=False)
            logger.info(f" [-] Saving {evaluation_path + csv_output_filename} csv file to local.")
            chunk_index = chunk_index + 1

        except Exception as e:
            print(f"Erro {e}, item {chunk}")

Error trace

Expected behavior I would except the evaluation task to finish or at least to send me an error

Additional context Add any other context about the problem here.

Mar 07 '24 09:03 anoukbarnoud-doctolib

Can also confirm that it is happening to me, that it is stuck at the end of evaluation.

~~Using ragas==0.1.2 worked for me~~ Using ragas==0.1.1 worked for me.

Mar 07 '24 13:03 chenzimin

try to add a period of time after the evaluation of each chunk. I was having the same issue and inseting time.sleep(60) after each chunk resolved the problem.

Mar 07 '24 14:03 LilianSilveira

Can also confirm that it is happening to me, that it is stuck at the end of evaluation.

~Using ragas==0.1.2 worked for me~ Using ragas==0.1.1 worked for me.

Downgrade to 0.1.1 didn't help in my case :(

May 15 '24 11:05 izikeros

Same issue. (tried with ragas v0.1.0, v0.1.8, v0.1.7)

any other ideas or workaround ?

May 22 '24 15:05 Berber31

I have the same issue, I even tried it with only 5 samples, still gets stuck in the last one, so annoying ..

the below suggestion didn't work, I even tried with only one instance and with 5 metrics, it got stuck in the last metric for one instance.

try to add a period of time after the evaluation of each chunk. I was having the same issue and inseting time.sleep(60) after each chunk resolved the problem.

Jun 05 '24 12:06 damlitos

hey @anoukbarnoud-doctolib @chenzimin @izikeros @Berber31 @damlitos !

first of all sorry for the delay in my response and the trouble it might have caused you but thanks to @NIL-zhuang we have a fix for this issue. With this patch, the requests that were not servered from the server will be timed-out, which will result in NaN values for some rows. Do account for that when you analyse the results.

I would like to thank @NIL-zhuang again as part of the community to help us fix this issue 🙂 ❤️

Jun 06 '24 10:06 jjmachan

Got the same issue. For me, it was simply due to the fact that the assessment LLM (gpt-4o served via Azure in my case) filtered out certain requests with the openai.BadRequestError exception because the prompt message did not pass the OpenAI's moderation filter.

Ragas does not excplictly process this exception, and, as a result, the evaluate() method is stuck at the end on the failing query as by default it has the max_retries argument set to 10 and the max_wait argument set to 60 seconds.

I solved the problem by simply running evaluate() with raise_exceptions=False (so that it return np.nan for the failed entries) and run_config=RunConfig(max_retries=3, max_wait=20) (to limit the waiting time at the end of the evaluation).

Jun 13 '24 19:06 antigregory

Still I have almost similar issue. Friday my script ran perfect and Now I am trying evaluation with 5 different matrics. Evaluation script taking huge time in 5th step and starting 4 steps ran very fast. Please let me know which version of RAGAS package is working fine.

Changes which resolved my issue: Direct pip install ragas -q which has 0.1.9 version, which dont have thresd_timeout fix. so used directly !pip install git+https://github.com/explodinggradients/ragas Which resolved my issue, Now its not failing, but giving 0 value for few matrics

Jun 24 '24 05:06 prabham17

for me issue still exists even with https://github.com/explodinggradients/ragas . Can anyone let us know which version of ragas can we use

Jun 24 '24 15:06 Remya-Ramachandran

appologies for the hard time but this has been fixed in the last releases #1093 closing this for now but let me know if it's not working for you still. Will help you out

Aug 02 '24 06:08 jjmachan