ragas icon indicating copy to clipboard operation
ragas copied to clipboard

MultiHopAbstractQuerySynthesizer testset generation is not working.

Open kh-taher opened this issue 1 year ago • 7 comments

[yes] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug

when setting query distribution to MultiHopAbstractQuerySynthesizer, test generation fails.

Ragas version: 0.2.6 Python version: 3.9

Code to Reproduce

from ragas.testset.synthesizers import default_query_distribution
from ragas.testset import TestsetGenerator

query_distribution = default_query_distribution(llm=generator_llm)
new_q = [query_distribution[1]] #select only MultiHopAbstractQuerySynthesizer
generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings, knowledge_graph=loaded_kg, persona_list=personas)
testset = generator.generate(testset_size=20, with_debugging_logs=True, query_distribution=new_q, raise_exceptions=False)

Error trace

When raise_exceptions=False

Exception raised in Job[0]: ValueError(No clusters found in the knowledge graph. Try changing the relationship condition.)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[58], [line 1](vscode-notebook-cell:?execution_count=58&line=1)
----> [1](vscode-notebook-cell:?execution_count=58&line=1) testset = generator.generate(testset_size=20, with_debugging_logs=True, query_distribution=new_q, raise_exceptions=False)
      [2](vscode-notebook-cell:?execution_count=58&line=2) testset.to_pandas()

File c:\Users\khahmed\AppData\Local\miniforge3\envs\ragas_v.0\lib\site-packages\ragas\testset\synthesizers\generate.py:434, in TestsetGenerator.generate(self, testset_size, query_distribution, num_personas, run_config, batch_size, callbacks, token_usage_parser, with_debugging_logs, raise_exceptions)
    [432](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:432) additional_testset_info: t.List[t.Dict] = []
    [433](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:433) for i, (synthesizer, _) in enumerate(query_distribution):
--> [434](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:434)     for sample in scenario_sample_list[i]:
    [435](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:435)         exec.submit(
    [436](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:436)             synthesizer.generate_sample,
    [437](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:437)             scenario=sample,
    [438](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:438)             callbacks=sample_generation_grp,
    [439](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:439)         )
    [440](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:440)         # fill out the additional info for the TestsetSample

TypeError: 'float' object is not iterable

And when setting raise_exceptions=True

ValueError                                Traceback (most recent call last)
Cell In[52], [line 1](vscode-notebook-cell:?execution_count=52&line=1)
----> [1](vscode-notebook-cell:?execution_count=52&line=1) testset = generator.generate(testset_size=20, with_debugging_logs=True, query_distribution=new_q, raise_exceptions=True)
      [2](vscode-notebook-cell:?execution_count=52&line=2) testset.to_pandas()

File c:\Users\khahmed\AppData\Local\miniforge3\envs\ragas_v.0\lib\site-packages\ragas\testset\synthesizers\generate.py:413, in TestsetGenerator.generate(self, testset_size, query_distribution, num_personas, run_config, batch_size, callbacks, token_usage_parser, with_debugging_logs, raise_exceptions)
    [411](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:411) except Exception as e:
    [412](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:412)     scenario_generation_rm.on_chain_error(e)
--> [413](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:413)     raise e
    [414](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:414) else:
    [415](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:415)     scenario_generation_rm.on_chain_end(
    [416](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:416)         outputs={"scenario_sample_list": scenario_sample_list}
    [417](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:417)     )

File c:\Users\khahmed\AppData\Local\miniforge3\envs\ragas_v.0\lib\site-packages\ragas\testset\synthesizers\generate.py:410, in TestsetGenerator.generate(self, testset_size, query_distribution, num_personas, run_config, batch_size, callbacks, token_usage_parser, with_debugging_logs, raise_exceptions)
    [401](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:401)     exec.submit(
    [402](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:402)         scenario.generate_scenarios,
    [403](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:403)         n=splits[i],
   (...)
    [406](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:406)         callbacks=scenario_generation_grp,
    [407](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:407)     )
    [409](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:409) try:
--> [410](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:410)     scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
    [411](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:411) except Exception as e:
...
     [81](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py:81)     )
     [82](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py:82) num_sample_per_cluster = int(np.ceil(n / len(node_clusters)))
     [84](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py:84) for cluster in node_clusters:

ValueError: No clusters found in the knowledge graph. Try changing the relationship condition.
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?8c5b5477-4f5c-42a9-abed-7adc4af9c9c4) or open in a [text editor](command:workbench.action.openLargeOutput?8c5b5477-4f5c-42a9-abed-7adc4af9c9c4). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...

Expected behavior

Test generation should produce samples successfully like it did in SingleHopSpecificQuerySynthesizer and MultiHopSpecificQuerySynthesizer

Additional context

My knowledge graph had the following structure:

KnowledgeGraph(nodes: 219, relationships: 794)

The relationships are JaccardSimilarityBuilder and OverlapScoreBuilder between the entities.

kh-taher avatar Nov 21 '24 13:11 kh-taher

meet the same issue in my generation test.

bin1guo avatar Nov 26 '24 06:11 bin1guo

Hey @bin1guo @kh-taher the default settings may not be suitable for you, please try to configure one that suits your documents using https://docs.ragas.io/en/latest/howtos/customizations/testgenerator/_testgen-customisation/

shahules786 avatar Dec 05 '24 11:12 shahules786

same problem v0.2.13 ,python 3.10.12. No clusters found in the knowledge graph. Try changing the relationship condition.

@kh-taher @bin1guo have you fixed it?

shawn-maxiao avatar Feb 15 '25 02:02 shawn-maxiao

Previously working code does not work on the current version of Ragas, for this exact reason.

Customizing the query distribution merely changes the error from this one to "TypeError: object of type 'StringPromptValue' has no len()"

I have not gotten any test set generation to work, even if I just use a text file for the document. Lengthening my input document to 6500 characters does not resolve it. Can we get an update here, please? Does anyone have a solution to this? It seems like a recent update simply broke the entire test set generation system.

walesdata avatar Mar 04 '25 13:03 walesdata

I get the same issue. In my case the knowledge graph builds for a smaller set of files but running with the same parameters for a larger dataset throws "No nodes that satisfied the given filer"

sjjpo2002 avatar Mar 09 '25 22:03 sjjpo2002

@shahules786 any update on this? The problem is that the error is triggered after several minutes (what it takes to create the kg). It is also happening with a single document and asking for three questions.

mrtnzagustin avatar Jun 09 '25 13:06 mrtnzagustin

I have seen this problem when HeadlineSplitter from default_transforms wasn't producing children chunk nodes for my documents as my documents had no headlines. When I have added the headlines it worked just fine. I'm sure that's only one of the potential causes though, I'd assume there are multiple other reasons that can lead to this error

if that doesn't help you consider using only specific query synthesizer:

    custom_distribution = [
        (SingleHopSpecificQuerySynthesizer(llm=generator_llm), 1.0)
     ]
    
    testset = generator.generate(
        testset_size=testset_size,  # Number of test samples to generate
        query_distribution=custom_distribution,
        run_config=RunConfig()
    )

That one should work even on small-ish graphs

borowis avatar Jul 16 '25 12:07 borowis