Can't generate testdataset, always connection error and event loops is closed.
[x] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug LLM is started by ollama, so there's no connection issue and it is able to extract key phrase, but it always failed in generating.
Ragas version: latest source code. Python version: 3.10
Code to Reproduce
from typing import Dict
from langchain_community.document_loaders import DirectoryLoader
from langchain_openai import ChatOpenAI
from langchain_community.embeddings import HuggingFaceEmbeddings
from llama_index.llms.openai import utils
from ragas import RunConfig
from ragas.testset.evolutions import simple, reasoning, multi_context
from ragas.testset.generator import TestDataset, TestsetGenerator
loader = DirectoryLoader("/Users/xxx/Projects/graphrag/input")
documents = loader.load()
for document in documents:
document.metadata['filename'] = document.metadata['source']
import logging
logging.basicConfig(level=logging.DEBUG)
generator_llm = ChatOpenAI(model_name="phi3.5:latest", openai_api_base="http://localhost:11434/v1", temperature=0.7)
critic_llm = ChatOpenAI(model_name="phi3.5:latest", openai_api_base="http://localhost:11434/v1", temperature=0.7)
hf_embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-zh-v1.5", model_kwargs={"device": "mps"})
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
hf_embeddings,
run_config=RunConfig(max_workers=1, seed=42, max_retries=3, log_tenacity=True, timeout=180),
)
# generate testset
testset: TestDataset = generator.generate_with_langchain_docs(documents,
test_size=10,
distributions={simple: 0.5, reasoning: 0.25,
multi_context: 0.25},
run_config=RunConfig(max_workers=1, seed=42,
max_retries=1,
log_tenacity=True),
with_debugging_logs=True,
is_async=False)
ds = testset.to_dataset()
ds.save_to_disk("./activity_testset")
Error trace
DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'content': 'Generate a question that can be fully answered from given context. The question should be formed using topic\n\nExamples:\n\ncontext: "Photosynthesis in plants involves converting light energy into chemical energy, using chlorophyll and other pigments to absorb light. This process is crucial for plant growth and the production of oxygen."\nkeyphrase: "Photosynthesis"\nquestion: "What is the role of photosynthesis in plant growth?"\n\ncontext: "The Industrial Revolution, starting in the 18th century, marked a major turning point in history as it led to the development of factories and urbanization."\nkeyphrase: "Industrial Revolution"\nquestion: "How did the Industrial Revolution mark a major turning point in history?"\n\ncontext: "The process of evaporation plays a crucial role in the water cycle, converting water from liquid to vapor and allowing it to rise into the atmosphere."\nkeyphrase: "Evaporation"\nquestion: "Why is evaporation important in the water cycle?"\n\nYour actual task:\n\ncontext: "在2023年10月10日,陈刚带着他的文化交流团队,共计5人,以及多件传统工艺品,前往杭州的一个文化展览中心。当天,他们参加了一场文化交流展览。陈刚和他的团队展示了他们精心制作的传统手工艺品,并现场演示了制作过程。他们的展品吸引了大量参观者的关注,并获得了文化界人士的高度评价。展览结束后,他们还与其他文化工作者进行了交流,探讨了传统文化的传承和创新。"\nkeyphrase: "杭州的文化展览中心"\nquestion: \n', 'role': 'user'}], 'model': 'phi3.5:latest', 'n': 1, 'stream': False, 'temperature': 0}}
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.failed exception=RuntimeError('Event loop is closed')
DEBUG:openai._base_client:Encountered Exception
Traceback (most recent call last):
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/_base_client.py", line 1564, in _request
response = await self._client.send(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1674, in send
response = await self._send_handling_auth(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
response = await self._send_handling_redirects(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
response = await self._send_single_request(request)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1776, in _send_single_request
response = await transport.handle_async_request(request)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
resp = await self._pool.handle_async_request(req)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
raise exc from None
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 189, in handle_async_request
await self._close_connections(closing)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections
await connection.aclose()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection.py", line 171, in aclose
await self._connection.aclose()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/http11.py", line 265, in aclose
await self._network_stream.aclose()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 55, in aclose
await self._stream.aclose()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1258, in aclose
self._transport.close()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/selector_events.py", line 706, in close
self._loop.call_soon(self._call_connection_lost, None)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
self._check_closed()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
DEBUG:openai._base_client:Raising connection error
Generating: 0%| | 0/10 [00:29<?, ?it/s]
Traceback (most recent call last):
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/_base_client.py", line 1564, in _request
response = await self._client.send(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1674, in send
response = await self._send_handling_auth(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
response = await self._send_handling_redirects(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
response = await self._send_single_request(request)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1776, in _send_single_request
response = await transport.handle_async_request(request)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
resp = await self._pool.handle_async_request(req)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
raise exc from None
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 189, in handle_async_request
await self._close_connections(closing)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections
await connection.aclose()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection.py", line 171, in aclose
await self._connection.aclose()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/http11.py", line 265, in aclose
await self._network_stream.aclose()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 55, in aclose
await self._stream.aclose()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1258, in aclose
self._transport.close()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/selector_events.py", line 706, in close
self._loop.call_soon(self._call_connection_lost, None)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
self._check_closed()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/evilkylin/Projects/starag-eva/gen.py", line 44, in <module>
testset: TestDataset = generator.generate_with_langchain_docs(documents,
File "/Users/evilkylin/Projects/ragas/src/ragas/testset/generator.py", line 210, in generate_with_langchain_docs
return self.generate(
File "/Users/evilkylin/Projects/ragas/src/ragas/_analytics.py", line 129, in wrapper
result = func(*args, **kwargs)
File "/Users/evilkylin/Projects/ragas/src/ragas/testset/generator.py", line 304, in generate
test_data_rows = exec.results()
File "/Users/evilkylin/Projects/ragas/src/ragas/executor.py", line 118, in results
results = asyncio.run(_aresults())
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/Users/evilkylin/Projects/ragas/src/ragas/executor.py", line 113, in _aresults
r = await future
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
return f.result() # May raise f.exception().
File "/Users/evilkylin/Projects/ragas/src/ragas/executor.py", line 34, in sema_coro
return await coro
File "/Users/evilkylin/Projects/ragas/src/ragas/executor.py", line 61, in wrapped_callable_async
raise e
File "/Users/evilkylin/Projects/ragas/src/ragas/executor.py", line 55, in wrapped_callable_async
result = await callable(*args, **kwargs)
File "/Users/evilkylin/Projects/ragas/src/ragas/testset/evolutions.py", line 143, in evolve
) = await self._aevolve(current_tries, current_nodes)
File "/Users/evilkylin/Projects/ragas/src/ragas/testset/evolutions.py", line 467, in _aevolve
simple_question, current_nodes, _ = await self.se._aevolve(
File "/Users/evilkylin/Projects/ragas/src/ragas/testset/evolutions.py", line 304, in _aevolve
results = await self.generator_llm.generate(
File "/Users/evilkylin/Projects/ragas/src/ragas/llms/base.py", line 96, in generate
return await agenerate_text_with_retry(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 189, in async_wrapped
return await copy(fn, *args, **kwargs)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 111, in __call__
do = await self.iter(retry_state=retry_state)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
result = await action(retry_state)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/__init__.py", line 398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 114, in __call__
result = await fn(*args, **kwargs)
File "/Users/evilkylin/Projects/ragas/src/ragas/llms/base.py", line 178, in agenerate_text
return await self.langchain_llm.agenerate_prompt(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 787, in agenerate_prompt
return await self.agenerate(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 747, in agenerate
raise exceptions[0]
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 923, in _agenerate_with_cache
result = await self._agenerate(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/langchain_openai/chat_models/base.py", line 843, in _agenerate
response = await self.async_client.create(**payload)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1412, in create
return await self._post(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/_base_client.py", line 1831, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/_base_client.py", line 1525, in request
return await self._request(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/_base_client.py", line 1598, in _request
raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.
Expected behavior Should be able to generate testdataset.
Additional context Add any other context about the problem here.
@KylinMountain Hello there! I'm here to help you with any bugs, questions, or contributions you may have. Let's get to the bottom of this together!
I found some similar issues that might help you resolve the "Event loop is closed" error and connection issues:
-
Open Issue: How can I set config parameters for test data generation
- Solution: Configure parameters like
max_workers,rate_limits, andtimeoutsto avoid connection timeout errors. Example:
from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context from langchain_openai import ChatOpenAI, OpenAIEmbeddings from ragas.run_config import RunConfig generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k") critic_llm = ChatOpenAI(model="gpt-4") embeddings = OpenAIEmbeddings() run_config = RunConfig( max_workers=5, rate_limits={"requests_per_minute": 60}, timeouts={"connect_timeout": 10, "read_timeout": 30} ) generator = TestsetGenerator.from_langchain( generator_llm=generator_llm, critic_llm=critic_llm, embeddings=embeddings, run_config=run_config ) testset = generator.generate_with_langchain_docs( documents=documents, test_size=10, distributions={simple: 0.5, reasoning: 0.3, multi_context: 0.2}, with_debugging_logs=False, is_async=True, raise_exceptions=False, run_config=run_config ) - Solution: Configure parameters like
-
Closed Issue: Given code @ https://github.com/explodinggradients/ragas is not working and raises an exception
- Solution: Use the
nest_asynciolibrary to allow nested use ofasyncio.run(). Example:
import nest_asyncio nest_asyncio.apply() - Solution: Use the
-
Closed Issue: Error coming in example code for Answer Correctness
- Solution: Set the
is_asyncparameter toFalsewhen calling theevaluatefunction to avoid conflicts with the existing event loop. Example:
score = evaluate(dataset, metrics=[answer_correctness], is_async=False) - Solution: Set the
To resolve the "Event loop is closed" error in your specific case, you can modify your code to ensure the event loop is properly managed. Here is an example:
import asyncio
from ragas.testset.generator import TestsetGenerator
async def main():
generator_llm = ChatOpenAI(model_name="phi3.5:latest", openai_api_base="http://localhost:11434/v1", temperature=0.7)
critic_llm = ChatOpenAI(model_name="phi3.5:latest", openai_api_base="http://localhost:11434/v1", temperature=0.7)
hf_embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-zh-v1.5", model_kwargs={"device": "mps"})
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
hf_embeddings,
run_config=RunConfig(max_workers=1, seed=42, max_retries=3, log_tenacity=True, timeout=180),
)
testset = await generator.generate_with_langchain_docs(
documents=documents,
test_size=10,
distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},
run_config=RunConfig(max_workers=1, seed=42, max_retries=1, log_tenacity=True),
with_debugging_logs=True,
is_async=True
)
ds = testset.to_dataset()
ds.save_to_disk("./activity_testset")
if __name__ == "__main__":
asyncio.run(main())
This should help manage the event loop correctly and avoid the "Event loop is closed" error [1][2][3][4].
To continue talking to Dosu, mention @dosu.
@shahules786 Do you know why? I am always getting this error, it's upset.
hey @KylinMountain that is a bummer. We have a new version of testset generation that is out - will you consider trying that out?
you will have to install it from source though
@jjmachan I would like to try, is there any instructions how to use new version? Is it same with current api? I’ve already install from source code
import nest_asyncio
nest_asyncio.apply()
Adding this helped in my case.
can confirm
import nest_asyncio nest_asyncio.apply() solved my issue
@KylinMountain please do the following
- upgrade to ragas 0.2
import nest_asyncio nest_asyncio.apply()- follow guide here
pip install -U ragas langchain pip install unstructured brew install libmagic pip install --upgrade nltk
https://github.com/bdytx5/ragas_tutorial/blob/main/demo_data_gen.py
Here's an updated script
@KylinMountain please do the following
- upgrade to ragas 0.2
import nest_asyncio nest_asyncio.apply()- follow guide here
Thanks for your update, let me have a try.
@shahules786 looks like there must some extra setting when using the other llm model compatible with open ai sdk.
@KylinMountain Can you share more details?
@shahules786 yeah. I have to set transforms
import nest_asyncio
from langchain_community.document_loaders import DirectoryLoader
from langchain_openai import ChatOpenAI
from ragas.embeddings import HuggingfaceEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.testset import TestsetGenerator
from ragas.testset.synthesizers import AbstractQuerySynthesizer, ComparativeAbstractQuerySynthesizer, \
SpecificQuerySynthesizer
from ragas.testset.transforms import Parallel, SummaryCosineSimilarityBuilder, CosineSimilarityBuilder, \
EmbeddingExtractor, HeadlineSplitter, HeadlinesExtractor, TitleExtractor, KeyphrasesExtractor, SummaryExtractor, \
Transforms
nest_asyncio.apply()
loader = DirectoryLoader("/Users/evilkylin/Projects/graphrag/data/proj-xz/input")
documents = loader.load()
for document in documents:
document.metadata['filename'] = document.metadata['source']
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="deepseek-chat", openai_api_base="https://api.deepseek.com/v1",
openai_api_key="xxx", n=1))
hf_embeddings = HuggingfaceEmbeddings(model_name="BAAI/bge-base-zh-v1.5", model_kwargs={"device": "mps"})
generator = TestsetGenerator(llm=generator_llm)
query_distribution = [
(AbstractQuerySynthesizer(llm=generator_llm), 0.25),
(ComparativeAbstractQuerySynthesizer(llm=generator_llm), 0.25),
(SpecificQuerySynthesizer(llm=generator_llm), 0.5),
]
def default_transforms() -> Transforms:
from ragas.testset.graph import NodeType
# define the transforms
summary_extractor = SummaryExtractor(llm=generator_llm)
keyphrase_extractor = KeyphrasesExtractor(llm=generator_llm)
title_extractor = TitleExtractor(llm=generator_llm)
headline_extractor = HeadlinesExtractor(llm=generator_llm)
embedding_extractor = EmbeddingExtractor(embedding_model=hf_embeddings)
headline_splitter = HeadlineSplitter()
cosine_sim_builder = CosineSimilarityBuilder(threshold=0.8)
summary_embedder = EmbeddingExtractor(
name="summary_embedder",
property_name="summary_embedding",
embed_property_name="summary",
filter_nodes=lambda node: True if node.type == NodeType.DOCUMENT else False,
embedding_model=hf_embeddings,
)
summary_cosine_sim_builder = SummaryCosineSimilarityBuilder(threshold=0.6)
# specify the transforms and their order to be applied
transforms = [
Parallel(summary_extractor, headline_extractor),
summary_embedder,
headline_splitter,
Parallel(embedding_extractor, keyphrase_extractor, title_extractor),
cosine_sim_builder,
summary_cosine_sim_builder,
]
return transforms
dataset = generator.generate_with_langchain_docs(documents,
testset_size=10,
transforms=default_transforms(),
query_distribution=query_distribution,
raise_exceptions=True,
with_debugging_logs=True
)
ds = dataset.to_dataset()
ds.save_to_disk("./activity_testset")
this is my code, I have copied the default transforms from source code and set llm to these transforms because the default transforms such as SummaryExtractor.
summary_extractor = SummaryExtractor()
@dataclass
class SummaryExtractor(LLMBasedExtractor):
"""
Extracts a summary from the given text.
Attributes
----------
property_name : str
The name of the property to extract.
prompt : SummaryExtractorPrompt
The prompt used for extraction.
"""
property_name: str = "summary"
prompt: SummaryExtractorPrompt = SummaryExtractorPrompt()
@dataclass
class LLMBasedExtractor(Extractor, PromptMixin):
llm: BaseRagasLLM = field(default_factory=llm_factory)
merge_if_possible: bool = True
def llm_factory(
model: str = "gpt-4o-mini",
run_config: t.Optional[RunConfig] = None,
default_headers: t.Optional[t.Dict[str, str]] = None,
base_url: t.Optional[str] = None,
) -> BaseRagasLLM:
you can see these transforms if you don't set llm to them , it will use llm_factory with default params that is OpenAI.
I think we should deliver llm and embedding to the inner default_transforms and initialized these transforms like this
def default_transforms(llm: BaseRagasLLM, embedding: BaseRagasEmbeddings) -> Transforms:
"""
Creates and returns a default set of transforms for processing a knowledge graph.
This function defines a series of transformation steps to be applied to a
knowledge graph, including extracting summaries, keyphrases, titles,
headlines, and embeddings, as well as building similarity relationships
between nodes.
The transforms are applied in the following order:
1. Parallel extraction of summaries and headlines
2. Embedding of summaries for document nodes
3. Splitting of headlines
4. Parallel extraction of embeddings, keyphrases, and titles
5. Building cosine similarity relationships between nodes
6. Building cosine similarity relationships between summaries
Returns
-------
Transforms
A list of transformation steps to be applied to the knowledge graph.
"""
from ragas.testset.graph import NodeType
# define the transforms
summary_extractor = SummaryExtractor(llm=llm)
keyphrase_extractor = KeyphrasesExtractor(llm=llm)
title_extractor = TitleExtractor(llm=llm)
headline_extractor = HeadlinesExtractor(llm=llm)
embedding_extractor = EmbeddingExtractor(embedding_model=embedding)
headline_splitter = HeadlineSplitter()
cosine_sim_builder = CosineSimilarityBuilder(threshold=0.8)
summary_embedder = EmbeddingExtractor(
name="summary_embedder",
property_name="summary_embedding",
embed_property_name="summary",
filter_nodes=lambda node: True if node.type == NodeType.DOCUMENT else False,
embedding_model=hf_embeddings,
)
summary_cosine_sim_builder = SummaryCosineSimilarityBuilder(threshold=0.6)
# specify the transforms and their order to be applied
transforms = [
Parallel(summary_extractor, headline_extractor),
summary_embedder,
headline_splitter,
Parallel(embedding_extractor, keyphrase_extractor, title_extractor),
cosine_sim_builder,
summary_cosine_sim_builder,
]
return transforms
Even if I have set these transforms, it still report errors like ValueError: no clusters found. Try running a few transforms to populate the dataset.
/Users/evilkylin/Projects/miniconda3/envs/starag-eva/bin/python /Users/evilkylin/Projects/starag-eva/gen.py
Generating Scenarios: 0%| | 0/3 [00:00<?, ?it/s]
Generating common_concepts: 50%|█████ | 1/2 [00:05<00:05, 5.11s/it]
Generating common_concepts: 100%|██████████| 2/2 [00:06<00:00, 2.80s/it]
Traceback (most recent call last):
File "/Users/evilkylin/Projects/starag-eva/gen.py", line 91, in <module>
dataset = generator.generate_with_langchain_docs(documents,
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/generate.py", line 92, in generate_with_langchain_docs
return self.generate(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/generate.py", line 186, in generate
scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 146, in results
results = asyncio.run(_aresults())
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/nest_asyncio.py", line 30, in run
return loop.run_until_complete(task)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/nest_asyncio.py", line 98, in run_until_complete
return f.result()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/futures.py", line 201, in result
raise self._exception.with_traceback(self._exception_tb)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/tasks.py", line 232, in __step
result = coro.send(None)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 141, in _aresults
r = await future
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
return f.result() # May raise f.exception().
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/futures.py", line 201, in result
raise self._exception.with_traceback(self._exception_tb)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/tasks.py", line 232, in __step
result = coro.send(None)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 36, in sema_coro
return await coro
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 81, in wrapped_callable_async
raise e
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 78, in wrapped_callable_async
result = await callable(*args, **kwargs)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/base.py", line 89, in generate_scenarios
scenarios = await self._generate_scenarios(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/specific_query.py", line 61, in _generate_scenarios
sampled_nodes = random.choices(nodes, k=n)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/random.py", line 519, in choices
return [population[floor(random() * n)] for i in _repeat(None, k)]
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/random.py", line 519, in <listcomp>
return [population[floor(random() * n)] for i in _repeat(None, k)]
IndexError: list index out of range
Task exception was never retrieved
future: <Task finished name='Task-285' coro=<as_completed.<locals>.sema_coro() done, defined at /Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py:34> exception=ValueError('no clusters found. Try running a few transforms to populate the dataset')>
Traceback (most recent call last):
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/tasks.py", line 232, in __step
result = coro.send(None)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 36, in sema_coro
return await coro
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 81, in wrapped_callable_async
raise e
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 78, in wrapped_callable_async
result = await callable(*args, **kwargs)
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/base.py", line 89, in generate_scenarios
scenarios = await self._generate_scenarios(
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/abstract_query.py", line 96, in _generate_scenarios
raise ValueError(
ValueError: no clusters found. Try running a few transforms to populate the dataset
Process finished with exit code 1
The error about ValueError: no clusters found. Try running a few transforms to populate the dataset, I think it is due to my input file is txt with several hundred tokens, which is not able to generate dataset. you can try it.
When I changed the input source to a pdf file, it is ok to generate.
@shahules786 Would you fix such issues?
Another issue, it can't save the generate dataset, NameError: name 'EvaluationDataset' is not defined.
Too much issues....this is really upset.
dataset.to_pandas()
Traceback (most recent call last):
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/dataset_schema.py", line 197, in to_pandas
data = self._to_list()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/testset_schema.py", line 52, in _to_list
eval_list = self.to_evaluation_dataset()._to_list()
File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/testset_schema.py", line 47, in to_evaluation_dataset
return EvaluationDataset(
NameError: name 'EvaluationDataset' is not defined
Just comment the if statement here:
#if t.TYPE_CHECKING: from ragas.dataset_schema import ( EvaluationDataset, MultiTurnSample, SingleTurnSample, )
`
comment the if statement and reverse tab code underneath. -- the file is in should be in /Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/testset_schema.py
@shahules786 this is a pretty big issue
@KylinMountain @bdytx5 I'm working on the fix for that - a patch release will be out in a few hours
@KylinMountain a patch release is out
The error about
ValueError: no clusters found. Try running a few transforms to populate the dataset, I think it is due to my input file is txt with several hundred tokens, which is not able to generate dataset. you can try it.When I changed the input source to a pdf file, it is ok to generate.
this is because the knowledge graph might not have enough nodes to find the clusters. For small documents SpecificQuerySynthesizer must be enough