MultiHopAbstractQuerySynthesizer testset generation is not working.
[yes] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
when setting query distribution to MultiHopAbstractQuerySynthesizer, test generation fails.
Ragas version: 0.2.6 Python version: 3.9
Code to Reproduce
from ragas.testset.synthesizers import default_query_distribution
from ragas.testset import TestsetGenerator
query_distribution = default_query_distribution(llm=generator_llm)
new_q = [query_distribution[1]] #select only MultiHopAbstractQuerySynthesizer
generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings, knowledge_graph=loaded_kg, persona_list=personas)
testset = generator.generate(testset_size=20, with_debugging_logs=True, query_distribution=new_q, raise_exceptions=False)
Error trace
When raise_exceptions=False
Exception raised in Job[0]: ValueError(No clusters found in the knowledge graph. Try changing the relationship condition.)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[58], [line 1](vscode-notebook-cell:?execution_count=58&line=1)
----> [1](vscode-notebook-cell:?execution_count=58&line=1) testset = generator.generate(testset_size=20, with_debugging_logs=True, query_distribution=new_q, raise_exceptions=False)
[2](vscode-notebook-cell:?execution_count=58&line=2) testset.to_pandas()
File c:\Users\khahmed\AppData\Local\miniforge3\envs\ragas_v.0\lib\site-packages\ragas\testset\synthesizers\generate.py:434, in TestsetGenerator.generate(self, testset_size, query_distribution, num_personas, run_config, batch_size, callbacks, token_usage_parser, with_debugging_logs, raise_exceptions)
[432](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:432) additional_testset_info: t.List[t.Dict] = []
[433](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:433) for i, (synthesizer, _) in enumerate(query_distribution):
--> [434](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:434) for sample in scenario_sample_list[i]:
[435](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:435) exec.submit(
[436](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:436) synthesizer.generate_sample,
[437](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:437) scenario=sample,
[438](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:438) callbacks=sample_generation_grp,
[439](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:439) )
[440](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:440) # fill out the additional info for the TestsetSample
TypeError: 'float' object is not iterable
And when setting raise_exceptions=True
ValueError Traceback (most recent call last)
Cell In[52], [line 1](vscode-notebook-cell:?execution_count=52&line=1)
----> [1](vscode-notebook-cell:?execution_count=52&line=1) testset = generator.generate(testset_size=20, with_debugging_logs=True, query_distribution=new_q, raise_exceptions=True)
[2](vscode-notebook-cell:?execution_count=52&line=2) testset.to_pandas()
File c:\Users\khahmed\AppData\Local\miniforge3\envs\ragas_v.0\lib\site-packages\ragas\testset\synthesizers\generate.py:413, in TestsetGenerator.generate(self, testset_size, query_distribution, num_personas, run_config, batch_size, callbacks, token_usage_parser, with_debugging_logs, raise_exceptions)
[411](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:411) except Exception as e:
[412](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:412) scenario_generation_rm.on_chain_error(e)
--> [413](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:413) raise e
[414](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:414) else:
[415](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:415) scenario_generation_rm.on_chain_end(
[416](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:416) outputs={"scenario_sample_list": scenario_sample_list}
[417](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:417) )
File c:\Users\khahmed\AppData\Local\miniforge3\envs\ragas_v.0\lib\site-packages\ragas\testset\synthesizers\generate.py:410, in TestsetGenerator.generate(self, testset_size, query_distribution, num_personas, run_config, batch_size, callbacks, token_usage_parser, with_debugging_logs, raise_exceptions)
[401](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:401) exec.submit(
[402](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:402) scenario.generate_scenarios,
[403](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:403) n=splits[i],
(...)
[406](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:406) callbacks=scenario_generation_grp,
[407](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:407) )
[409](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:409) try:
--> [410](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:410) scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
[411](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:411) except Exception as e:
...
[81](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py:81) )
[82](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py:82) num_sample_per_cluster = int(np.ceil(n / len(node_clusters)))
[84](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py:84) for cluster in node_clusters:
ValueError: No clusters found in the knowledge graph. Try changing the relationship condition.
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?8c5b5477-4f5c-42a9-abed-7adc4af9c9c4) or open in a [text editor](command:workbench.action.openLargeOutput?8c5b5477-4f5c-42a9-abed-7adc4af9c9c4). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...
Expected behavior
Test generation should produce samples successfully like it did in SingleHopSpecificQuerySynthesizer and MultiHopSpecificQuerySynthesizer
Additional context
My knowledge graph had the following structure:
KnowledgeGraph(nodes: 219, relationships: 794)
The relationships are JaccardSimilarityBuilder and OverlapScoreBuilder between the entities.
meet the same issue in my generation test.
Hey @bin1guo @kh-taher the default settings may not be suitable for you, please try to configure one that suits your documents using https://docs.ragas.io/en/latest/howtos/customizations/testgenerator/_testgen-customisation/
same problem v0.2.13 ,python 3.10.12. No clusters found in the knowledge graph. Try changing the relationship condition.
@kh-taher @bin1guo have you fixed it?
Previously working code does not work on the current version of Ragas, for this exact reason.
Customizing the query distribution merely changes the error from this one to "TypeError: object of type 'StringPromptValue' has no len()"
I have not gotten any test set generation to work, even if I just use a text file for the document. Lengthening my input document to 6500 characters does not resolve it. Can we get an update here, please? Does anyone have a solution to this? It seems like a recent update simply broke the entire test set generation system.
I get the same issue. In my case the knowledge graph builds for a smaller set of files but running with the same parameters for a larger dataset throws "No nodes that satisfied the given filer"
@shahules786 any update on this? The problem is that the error is triggered after several minutes (what it takes to create the kg). It is also happening with a single document and asking for three questions.
I have seen this problem when HeadlineSplitter from default_transforms wasn't producing children chunk nodes for my documents as my documents had no headlines. When I have added the headlines it worked just fine. I'm sure that's only one of the potential causes though, I'd assume there are multiple other reasons that can lead to this error
if that doesn't help you consider using only specific query synthesizer:
custom_distribution = [
(SingleHopSpecificQuerySynthesizer(llm=generator_llm), 1.0)
]
testset = generator.generate(
testset_size=testset_size, # Number of test samples to generate
query_distribution=custom_distribution,
run_config=RunConfig()
)
That one should work even on small-ish graphs