ragas Synthetic Test Data generation doesn't output columns ['answer'] on testset

Your Question Synthetic Test Data generation doesn't output columns ['answer'] as in this Example

Code Examples Load Doc

from langchain_community.document_loaders import DirectoryLoader
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

loader = DirectoryLoader(
    "./Docs/", use_multithreading=True, silent_errors=True, sample_size=1
)
documents = loader.load()

for document in documents:
    document.metadata["filename"] = document.metadata["source"]`

Load Modules (I Use Azure Open AI )

from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

generator_llm = AzureChatOpenAI(
    azure_deployment='gpt-4o',
    api_version="2024-02-15-preview",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    validate_base_url=False
)
critic_llm = AzureChatOpenAI(
    azure_deployment='gpt-4o',
    api_version="2024-02-15-preview",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    validate_base_url=False
)
embeddings = AzureOpenAIEmbeddings(
    azure_deployment="text-embedding-3-large",
    openai_api_version="2024-02-15-preview",
)

Generate Test

generator = TestsetGenerator.from_langchain(
    generator_llm=generator_llm, critic_llm=critic_llm, embeddings=embeddings
)

testset = generator.generate_with_langchain_docs(
    documents,
    test_size=10,
    raise_exceptions=False,
    with_debugging_logs=False,
    distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},
)

Display dataset/testset columns*

 # testset to panda
testset.to_pandas().columns
Index(['question', 'contexts', 'ground_truth', 'evolution_type', 'metadata',
       'episode_done'],
      dtype='object')

testset_to_dataset  = testset.to_dataset()
Dataset({
    features: ['question', 'contexts', 'ground_truth', 'evolution_type', 'metadata', 'episode_done'],
    num_rows: 10

Run evaluate on the testset_to_dataset

from ragas.metrics import (
    context_precision,
    answer_relevancy,
    faithfulness,
    context_recall,
)
from ragas.metrics.critique import harmfulness

# list of metrics we're going to use
metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    harmfulness,
]
result = evaluate(
    testset_to_dataset, metrics=metrics, llm=generator_llm, embeddings=embeddings
)

result

ValueError: The metric [faithfulness] that that is used requires the following additional columns ['answer'] to be present in the dataset.

})

Jul 31 '24 17:07 shayefrati

@shayefarti Hello there! I'm here to help you with any issues you encounter. Let me know how I can assist you.

To ensure that the 'answer' column is included in the generated test set, you need to modify the generate_with_langchain_docs method to include the 'answer' column. Here is how you can do it:

Generate Responses for Questions: Use a function to query your query_engine with each question and compile the results into a DataFrame that includes the "answer" column.
Create the Dataset: Use the generated responses to create a dataset that includes the "answer" column.

Here is an example code snippet to help you generate the "answer" column:

from datasets import Dataset
from tqdm.auto import tqdm
import pandas as pd

def generate_response(query_engine, question):
    response = query_engine.query(question)
    return {
        "answer": response.response,
        "contexts": [c.node.get_content() for c in response.source_nodes],
    }

def generate_ragas_dataset(query_engine, test_df):
    test_questions = test_df["question"].values
    responses = [generate_response(query_engine, q) for q in tqdm(test_questions)]

    dataset_dict = {
        "question": test_questions,
        "answer": [response["answer"] for response in responses],
        "contexts": [response["contexts"] for response in responses],
        "ground_truth": test_df["ground_truth"].values.tolist(),
    }
    ds = Dataset.from_dict(dataset_dict)
    return ds

# Assuming `query_engine` and `test_df` are already defined
ragas_eval_dataset = generate_ragas_dataset(query_engine, test_df)
ragas_evals_df = pd.DataFrame(ragas_eval_dataset)
print(ragas_evals_df.head())

Additionally, ensure that your TestsetGenerator is correctly configured to use the Azure OpenAI models:

azure_model = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    validate_base_url=False,
)

azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["embedding_deployment"],
    model=azure_configs["embedding_name"],
)

generator = TestsetGenerator.from_langchain(
    generator_llm=azure_model, critic_llm=azure_model, embeddings=azure_embeddings
)

testset = generator.generate_with_langchain_docs(
    documents,
    test_size=10,
    raise_exceptions=False,
    with_debugging_logs=False,
    distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},
)

This configuration ensures that the 'answer' column is generated in the test set ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Jul 31 '24 17:07 dosubot[bot]

@dosu hi So If I understand, it’s look like a bug, isn’t it?

Jul 31 '24 20:07 shayefrati

The issue you're experiencing with the 'answer' column not being generated in the test set when using the generate_with_langchain_docs method is likely due to a missing step in the configuration or generation process rather than a bug. The method relies on the generate method, which is responsible for the overall generation process, including the 'answer' column.

To ensure the 'answer' column is generated, you should verify the following:

Ensure that the distributions parameter is correctly set to include the necessary evolutions that generate the 'answer' column.
Check if the init_evolution method properly initializes the evolutions with the required LLMs and docstore.

Here is an example of how to generate the 'answer' column:

from datasets import Dataset
from tqdm.auto import tqdm
import pandas as pd

def generate_response(query_engine, question):
    response = query_engine.query(question)
    return {
        "answer": response.response,
        "contexts": [c.node.get_content() for c in response.source_nodes],
    }

def generate_ragas_dataset(query_engine, test_df):
    test_questions = test_df["question"].values
    responses = [generate_response(query_engine, q) for q in tqdm(test_questions)]

    dataset_dict = {
        "question": test_questions,
        "answer": [response["answer"] for response in responses],
        "contexts": [response["contexts"] for response in responses],
        "ground_truth": test_df["ground_truth"].values.tolist(),
    }
    ds = Dataset.from_dict(dataset_dict)
    return ds

# Assuming `query_engine` and `test_df` are already defined
ragas_eval_dataset = generate_ragas_dataset(query_engine, test_df)
ragas_evals_df = pd.DataFrame(ragas_eval_dataset)
print(ragas_evals_df.head())

Ensure that your TestsetGenerator is correctly configured to use the Azure OpenAI models and that the distributions parameter includes the necessary evolutions to generate the 'answer' column ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Jul 31 '24 20:07 dosubot[bot]

Hi @shayefarti,

You can look at this previous answer to help you solve this issue: https://github.com/explodinggradients/ragas/issues/1084#issuecomment-2248219601

To deal with your example in depth, if I recall correctly, I've faced this issue regarding the tutorials and I came to the conclusion that this example tutorial had been made when TestsetGenerator.generate_with_langchain_docs was generating answer column. Nowadays, this named column is used to contain the results of your RAG, while the generated responses to the synthetic question/contexts is under the column named "ground_truth". So to evaluate your RAG you need to generate answer if you want to measure the quality (answer_relevancy, answer_similarity, answer_correctness, faithfulness) of the RAG's responses.

Aug 01 '24 08:08 Gwenn-LR

@shayefarti thanks for bringing this up, it is a documentation error actually and do refer to @Gwenn-LR excellent explanation (Thanks a lot @Gwenn-LR ❤️ ). these are the columns now

@shahules786 we should update the documentation with the new table?

Aug 02 '24 04:08 jjmachan