Significance of docstore in TestsetGenerator.
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question What is the use of docstore in TestsetGenerator. How it will affect the evaluation process if use it and if i do not use it
Code Examples
class TestsetGenerator:
generator_llm: BaseRagasLLM
critic_llm: BaseRagasLLM
embeddings: BaseRagasEmbeddings
docstore: DocumentStore
@classmethod
def from_llama_index(
cls,
generator_llm: LlamaindexLLM,
critic_llm: LlamaindexLLM,
embeddings: LlamaIndexEmbeddings,
docstore: t.Optional[DocumentStore] = None,
run_config: RunConfig = RunConfig(),
) -> "TestsetGenerator":
generator_llm_model = LlamaIndexLLMWrapper(generator_llm)
critic_llm_model = LlamaIndexLLMWrapper(critic_llm)
embeddings_model = LlamaIndexEmbeddingsWrapper(embeddings)
keyphrase_extractor = KeyphraseExtractor(llm=generator_llm_model)
if docstore is None:
from langchain.text_splitter import TokenTextSplitter
splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)
docstore = InMemoryDocumentStore(
splitter=splitter,
embeddings=embeddings_model,
extractor=keyphrase_extractor,
run_config=run_config,
)
return cls(
generator_llm=generator_llm_model,
critic_llm=critic_llm_model,
embeddings=embeddings_model,
docstore=docstore,
)
Additional context
When I generate the testset using TestsetGenerator.from_llama_index i found that for many of the 'questions' the gound_truth is coming as 'The answer to given question is not present in context'. Please also help me understand from where does Ragas is getting the ground_truth and contexts?
The significance of docstore in TestsetGenerator is that it is used to store and manage the documents during the test set generation process. If you do not use docstore, the TestsetGenerator will create an InMemoryDocumentStore by default, which handles document storage and management internally.
Using a custom docstore allows you to have more control over how documents are stored, retrieved, and managed, which can be beneficial for specific use cases or optimizations.
When generating the test set using TestsetGenerator.from_llama_index, Ragas determines the ground_truth and contexts by chunking the provided documents and adding them to the docstore. The docstore then manages these documents and provides the necessary data for the evaluation process. The InMemoryDocumentStore specifically handles the splitting of documents into smaller nodes, embedding these nodes, and extracting keyphrases, which are then used to calculate document similarities and set relationships between nodes.
Here is the relevant code snippet:
@classmethod
def from_llama_index(
cls,
generator_llm: LlamaindexLLM,
critic_llm: LlamaindexLLM,
embeddings: LlamaIndexEmbeddings,
docstore: t.Optional[DocumentStore] = None,
run_config: RunConfig = RunConfig(),
) -> "TestsetGenerator":
generator_llm_model = LlamaIndexLLMWrapper(generator_llm)
critic_llm_model = LlamaIndexLLMWrapper(critic_llm)
embeddings_model = LlamaIndexEmbeddingsWrapper(embeddings)
keyphrase_extractor = KeyphraseExtractor(llm=generator_llm_model)
if docstore is None:
from langchain.text_splitter import TokenTextSplitter
splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)
docstore = InMemoryDocumentStore(
splitter=splitter,
embeddings=embeddings_model,
extractor=keyphrase_extractor,
run_config=run_config,
)
return cls(
generator_llm=generator_llm_model,
critic_llm=critic_llm_model,
embeddings=embeddings_model,
docstore=docstore,
)
And the method for generating the test set:
def generate_with_llamaindex_docs(
self,
documents: t.Sequence[LlamaindexDocument],
test_size: int,
distributions: t.Optional[Distributions] = None,
with_debugging_logs=False,
is_async: bool = True,
raise_exceptions: bool = True,
run_config: t.Optional[RunConfig] = None,
):
distributions = distributions or {}
# chunk documents and add to docstore
self.docstore.add_documents(
[Document.from_llamaindex_document(doc) for doc in documents]
)
return self.generate(
test_size=test_size,
distributions=distributions,
with_debugging_logs=with_debugging_logs,
is_async=is_async,
run_config=run_config,
raise_exceptions=raise_exceptions,
)
The InMemoryDocumentStore class handles the addition of documents by splitting them into smaller nodes, embedding these nodes, and extracting keyphrases. It also calculates similarities between nodes and sets relationships, which are crucial for the evaluation process [1][2].
To continue talking to Dosu, mention @dosu.
@dosu please help me understand why for many of the questions, I am getting 'ground_truth' as 'The answer to given question is not present in context' even though the context` is available in 'contexts' column.
You are getting 'ground_truth' as 'The answer to given question is not present in context' for many questions because the question_answer_prompt is designed to output this verdict if the answer is not explicitly present in the context. This is evident from the examples in the prompt, such as:
{
"context": """The novel "Pride and Prejudice" by Jane Austen revolves around the character Elizabeth Bennet and her family. The story is set in the 19th century in rural England and deals with issues of marriage, morality, and misconceptions.""",
"question": "What year was 'Pride and Prejudice' published?",
"answer": AnswerFormat.parse_obj(
{
"answer": "The answer to given question is not present in context",
"verdict": "-1",
}
).dict(),
}
In this example, the context does not contain the publication year of "Pride and Prejudice," leading to the verdict '-1'. If your context does not explicitly contain the information required to answer the question, the prompt will output that the answer is not present.
Additionally, if you are using a RAG (Retrieval-Augmented Generation) system, the ground_truth column can be ignored for online evaluations since RAG systems often perform reference-free evaluations. This means that the system does not rely on pre-defined ground-truth answers and instead evaluates the generated answers based on metrics like faithfulness, answer relevancy, and context precision [1][2][3][4].
To continue talking to Dosu, mention @dosu.
@dosu from where and how does Ragas create 'Contexts' using TestSetGenerator?
Ragas creates 'Contexts' using TestsetGenerator by initializing it with the necessary components such as language models, embeddings, and a document store. This is done using the from_langchain or from_llama_index class methods. Here is the relevant code:
@dataclass
class TestsetGenerator:
generator_llm: BaseRagasLLM
critic_llm: BaseRagasLLM
embeddings: BaseRagasEmbeddings
docstore: DocumentStore
@classmethod
def from_langchain(
cls,
generator_llm: LangchainLLM,
critic_llm: LangchainLLM,
embeddings: LangchainEmbeddings,
docstore: t.Optional[DocumentStore] = None,
run_config: RunConfig = RunConfig(),
chunk_size: int = 1024,
) -> "TestsetGenerator":
generator_llm_model = LangchainLLMWrapper(generator_llm)
critic_llm_model = LangchainLLMWrapper(critic_llm)
embeddings_model = LangchainEmbeddingsWrapper(embeddings)
keyphrase_extractor = KeyphraseExtractor(llm=generator_llm_model)
if docstore is None:
from langchain.text_splitter import TokenTextSplitter
splitter = TokenTextSplitter(chunk_size=chunk_size, chunk_overlap=0)
docstore = InMemoryDocumentStore(
splitter=splitter,
embeddings=embeddings_model,
extractor=keyphrase_extractor,
run_config=run_config,
)
return cls(
generator_llm=generator_llm_model,
critic_llm=critic_llm_model,
embeddings=embeddings_model,
docstore=docstore,
)
@classmethod
def from_llama_index(
cls,
generator_llm: LlamaindexLLM,
critic_llm: LlamaindexLLM,
embeddings: LlamaIndexEmbeddings,
docstore: t.Optional[DocumentStore] = None,
run_config: RunConfig = RunConfig(),
) -> "TestsetGenerator":
generator_llm_model = LlamaIndexLLMWrapper(generator_llm)
critic_llm_model = LlamaIndexLLMWrapper(critic_llm)
embeddings_model = LlamaIndexEmbeddingsWrapper(embeddings)
keyphrase_extractor = KeyphraseExtractor(llm=generator_llm_model)
if docstore is None:
from langchain.text_splitter import TokenTextSplitter
splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)
docstore = InMemoryDocumentStore(
splitter=splitter,
embeddings=embeddings_model,
extractor=keyphrase_extractor,
run_config=run_config,
)
return cls(
generator_llm=generator_llm_model,
critic_llm=critic_llm_model,
embeddings=embeddings_model,
docstore=docstore,
)
This code shows how TestsetGenerator is initialized with the necessary components to create contexts [1].
To continue talking to Dosu, mention @dosu.
@dosu Let me rephrase my question. To evaluate my RAG application build on llamaindex, i am using RAGAS. To evaluate using RAGAS we need to first generate the testset using TestsetGenerator with the below code:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
generator = TestsetGenerator.from_llama_index(
generator_llm=generator_llm,
critic_llm=critic_llm,
embeddings=embeddings,
)
# generate testset
testset = generator.generate_with_llamaindex_docs(
documents,
test_size=5,
distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},
)
This testset contains columns 'question', 'contexts', 'ground_truth' ,'evolution_type','metadata','episode_done'.
When is did the evaluation using the testset using the below code:
from ragas.integrations.llama_index import evaluate
ds = testset.to_dataset()
ds_dict = ds.to_dict()
result = evaluate(
query_engine=query_engine,
metrics=metrics,
dataset=ds_dict,
llm=evaluator_llm,
embeddings=OpenAIEmbedding(),
)
I found that for many questions, the ground_truth is coming as 'The answer to given question is not present in context' but the answer column has a correct answer. Is it RAGAS not able to generate the ground_truth properly or the contexts are not properly fetched. How to handle such cases?
The ground_truth is coming as 'The answer to given question is not present in context' for many questions because the contexts retrieved by the query engine do not contain the expected answer. This discrepancy occurs even though the answer column has a correct answer.
To handle such cases, ensure that the contexts retrieved by the query engine are relevant and contain the information needed to answer the question. You can improve the retrieval process by:
- Improving the quality of the embeddings: Ensure that the embeddings used for document retrieval are of high quality and accurately represent the content.
- Tuning the retrieval parameters: Adjust parameters such as
similarity_top_kto retrieve more relevant contexts. - Enhancing the document chunking strategy: Ensure that documents are chunked in a way that preserves the context needed to answer questions.
Additionally, you can use the Ragas framework to objectively measure and tweak the performance of your QueryEngine. Here is a guide on how to integrate Ragas with LlamaIndex and evaluate your QueryEngine:
-
Load the documents:
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./nyc_wikipedia").load_data() -
Initialize the
TestsetGenerator:from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding generator_llm = OpenAI(model="gpt-3.5-turbo-16k") critic_llm = OpenAI(model="gpt-4") embeddings = OpenAIEmbedding() generator = TestsetGenerator.from_llama_index( generator_llm=generator_llm, critic_llm=critic_llm, embeddings=embeddings, ) -
Generate the testset:
testset = generator.generate_with_llamaindex_docs( documents, test_size=5, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}, ) df = testset.to_pandas() -
Build the
QueryEngine:from llama_index.core import VectorStoreIndex vector_index = VectorStoreIndex.from_documents(documents) query_engine = vector_index.as_query_engine() -
Evaluate the
QueryEngine:from ragas.metrics import ( faithfulness, answer_relevancy, context_precision, context_recall, ) from ragas.metrics.critique import harmfulness metrics = [ faithfulness, answer_relevancy, context_precision, context_recall, harmfulness, ] from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding evaluator_llm = OpenAI(model="gpt-3.5-turbo") ds = testset.to_dataset() ds_dict = ds.to_dict() from ragas.integrations.llama_index import evaluate result = evaluate( query_engine=query_engine, metrics=metrics, dataset=ds_dict, llm=evaluator_llm, embeddings=OpenAIEmbedding(), ) result.to_pandas()
By following these steps, you can ensure that the contexts retrieved are relevant to the questions, thereby avoiding the issue of 'The answer to given question is not present in context' [1][2][3].
To continue talking to Dosu, mention @dosu.
@dosu I am using AzureAISearchVectorStore , what changes i need to do to handle the above scenario ?
To handle the scenario where the ground_truth is coming as 'The answer to given question is not present in context' while using AzureAISearchVectorStore, you need to ensure that the contexts retrieved are relevant and contain the necessary information. Here are the steps to integrate AzureAISearchVectorStore and improve the retrieval process:
-
Initialize Azure OpenAI Models and Embeddings:
azure_configs = { "base_url": "https://<your-endpoint>.openai.azure.com/", "model_deployment": "your-deployment-name", "model_name": "your-model-name", "embedding_deployment": "your-deployment-name", "embedding_name": "text-embedding-ada-002", # most likely } from langchain_openai.chat_models import AzureChatOpenAI from langchain_openai.embeddings import AzureOpenAIEmbeddings azure_model = AzureChatOpenAI( openai_api_version="2023-05-15", azure_endpoint=azure_configs["base_url"], azure_deployment=azure_configs["model_deployment"], model=azure_configs["model_name"], validate_base_url=False, ) azure_embeddings = AzureOpenAIEmbeddings( openai_api_version="2023-05-15", azure_endpoint=azure_configs["base_url"], azure_deployment=azure_configs["embedding_deployment"], model=azure_configs["embedding_name"], ) -
Initialize the
TestsetGeneratorwith Azure Models and Embeddings:from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context generator = TestsetGenerator.from_langchain( generator_llm=azure_model, critic_llm=azure_model, embeddings=azure_embeddings, ) -
Generate the Testset:
testset = generator.generate_with_langchain_docs( documents, test_size=5, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}, ) df = testset.to_pandas() -
Build the
QueryEnginewithAzureAISearchVectorStore: Ensure that yourQueryEngineis properly configured to useAzureAISearchVectorStorefor context retrieval. This might involve setting up the vector store and ensuring it uses the Azure embeddings. -
Evaluate the
QueryEngine:from ragas.metrics import ( faithfulness, answer_relevancy, context_precision, context_recall, ) from ragas.metrics.critique import harmfulness metrics = [ faithfulness, answer_relevancy, context_precision, context_recall, harmfulness, ] evaluator_llm = AzureChatOpenAI( openai_api_version="2023-05-15", azure_endpoint=azure_configs["base_url"], azure_deployment=azure_configs["model_deployment"], model=azure_configs["model_name"], validate_base_url=False, ) ds = testset.to_dataset() ds_dict = ds.to_dict() from ragas.integrations.llama_index import evaluate result = evaluate( query_engine=query_engine, metrics=metrics, dataset=ds_dict, llm=evaluator_llm, embeddings=azure_embeddings, ) result.to_pandas()
By following these steps, you can ensure that the contexts retrieved are relevant to the questions, thereby avoiding the issue of 'The answer to given question is not present in context' [1].
To continue talking to Dosu, mention @dosu.
docstore no longer exists in recent versions. Checkout KnowledgeGraph and latest docs for current usage.
It seems the issue was answered, closing this now.