ragas icon indicating copy to clipboard operation
ragas copied to clipboard

Can I generate multi-hop reference_contexts Only? (without the question and reference)

Open GraderYuval opened this issue 10 months ago • 2 comments

Your Question I was wondering if the API support only generating the reference_contexts and not the whole flow? In addition, where can I find the actual links the context is based on?

Code Examples Currently I am doing:

query_distribution = [(MultiHopSpecificQuerySynthesizer(llm=self.generator_llm),1)]
           generator = TestsetGenerator(llm=self.generator_llm, embedding_model=self.generator_embeddings)
           dataset = generator.generate_with_langchain_docs(docs, testset_size= self.n_generations, query_distribution=query_distribution)
           chunks = []
           for sample in dataset.samples:
               # Check if the sample has any reference contexts.
               if sample.eval_sample.reference_contexts:
                   for context in sample.eval_sample.reference_contexts:
                       # Create a simple chunk object with page_content and metadata.
                       chunk = {
                                "page_content": context,
                                "metadata": {"Links": "TODO"}
                               }
                       chunks.append(chunk)

Thanks in advance.

GraderYuval avatar Feb 25 '25 11:02 GraderYuval

Hi @GraderYuval,

The API doesn't support only reference_contexts generation; the testset sample is created by the _generate_sample function, which returns:

return SingleTurnSample(
    user_input=response.query,
    reference=response.answer,
    reference_contexts=reference_context,
)

Could you clarify what you mean by "the actual links the context is based on"?

For context, the current testset generation flow works like this: the knowledge graph (KG) is constructed, clusters are identified from the KG, scenarios are generated based on those clusters, and then the testset sample is created.

sahusiddharth avatar Mar 04 '25 10:03 sahusiddharth

Hi, thank you very much for the reply. Regarding the "actual links...", I meant that if I use URLs to obtain the context, I want to know which specific URL each context hop came from. For example, if we are crawling a webpage and creating a multi-hop context from it, can I retrieve from the dataloader ('docs') the actual metadata of the URLs from which the context was extracted?

GraderYuval avatar Mar 15 '25 17:03 GraderYuval