ragas Can I generate multi-hop reference_contexts Only? (without the question and reference)

Your Question I was wondering if the API support only generating the reference_contexts and not the whole flow? In addition, where can I find the actual links the context is based on?

Code Examples Currently I am doing:

query_distribution = [(MultiHopSpecificQuerySynthesizer(llm=self.generator_llm),1)]
           generator = TestsetGenerator(llm=self.generator_llm, embedding_model=self.generator_embeddings)
           dataset = generator.generate_with_langchain_docs(docs, testset_size= self.n_generations, query_distribution=query_distribution)
           chunks = []
           for sample in dataset.samples:
               # Check if the sample has any reference contexts.
               if sample.eval_sample.reference_contexts:
                   for context in sample.eval_sample.reference_contexts:
                       # Create a simple chunk object with page_content and metadata.
                       chunk = {
                                "page_content": context,
                                "metadata": {"Links": "TODO"}
                               }
                       chunks.append(chunk)

Thanks in advance.

Feb 25 '25 11:02 GraderYuval

Hi @GraderYuval,

The API doesn't support only reference_contexts generation; the testset sample is created by the _generate_sample function, which returns:

return SingleTurnSample(
    user_input=response.query,
    reference=response.answer,
    reference_contexts=reference_context,
)

Could you clarify what you mean by "the actual links the context is based on"?

For context, the current testset generation flow works like this: the knowledge graph (KG) is constructed, clusters are identified from the KG, scenarios are generated based on those clusters, and then the testset sample is created.

Mar 04 '25 10:03 sahusiddharth

Hi, thank you very much for the reply. Regarding the "actual links...", I meant that if I use URLs to obtain the context, I want to know which specific URL each context hop came from. For example, if we are crawling a webpage and creating a multi-hop context from it, can I retrieve from the dataloader ('docs') the actual metadata of the URLs from which the context was extracted?

Mar 15 '25 17:03 GraderYuval