ragas icon indicating copy to clipboard operation
ragas copied to clipboard

Can't generate testdataset, always connection error and event loops is closed.

Open KylinMountain opened this issue 1 year ago • 20 comments

[x] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug LLM is started by ollama, so there's no connection issue and it is able to extract key phrase, but it always failed in generating.

Ragas version: latest source code. Python version: 3.10

Code to Reproduce

from typing import Dict

from langchain_community.document_loaders import DirectoryLoader
from langchain_openai import ChatOpenAI
from langchain_community.embeddings import HuggingFaceEmbeddings
from llama_index.llms.openai import utils
from ragas import RunConfig
from ragas.testset.evolutions import simple, reasoning, multi_context
from ragas.testset.generator import TestDataset, TestsetGenerator

loader = DirectoryLoader("/Users/xxx/Projects/graphrag/input")
documents = loader.load()

for document in documents:
    document.metadata['filename'] = document.metadata['source']

import logging

logging.basicConfig(level=logging.DEBUG)

generator_llm = ChatOpenAI(model_name="phi3.5:latest", openai_api_base="http://localhost:11434/v1", temperature=0.7)

critic_llm = ChatOpenAI(model_name="phi3.5:latest", openai_api_base="http://localhost:11434/v1", temperature=0.7)

hf_embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-zh-v1.5", model_kwargs={"device": "mps"})

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    hf_embeddings,
    run_config=RunConfig(max_workers=1, seed=42, max_retries=3, log_tenacity=True, timeout=180),
)

# generate testset
testset: TestDataset = generator.generate_with_langchain_docs(documents,
                                                              test_size=10,
                                                              distributions={simple: 0.5, reasoning: 0.25,
                                                                             multi_context: 0.25},
                                                              run_config=RunConfig(max_workers=1, seed=42,
                                                                                   max_retries=1,
                                                                                   log_tenacity=True),
                                                              with_debugging_logs=True,
                                                              is_async=False)

ds = testset.to_dataset()
ds.save_to_disk("./activity_testset")

Error trace

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'content': 'Generate a question that can be fully answered from given context. The question should be formed using topic\n\nExamples:\n\ncontext: "Photosynthesis in plants involves converting light energy into chemical energy, using chlorophyll and other pigments to absorb light. This process is crucial for plant growth and the production of oxygen."\nkeyphrase: "Photosynthesis"\nquestion: "What is the role of photosynthesis in plant growth?"\n\ncontext: "The Industrial Revolution, starting in the 18th century, marked a major turning point in history as it led to the development of factories and urbanization."\nkeyphrase: "Industrial Revolution"\nquestion: "How did the Industrial Revolution mark a major turning point in history?"\n\ncontext: "The process of evaporation plays a crucial role in the water cycle, converting water from liquid to vapor and allowing it to rise into the atmosphere."\nkeyphrase: "Evaporation"\nquestion: "Why is evaporation important in the water cycle?"\n\nYour actual task:\n\ncontext: "在2023年10月10日,陈刚带着他的文化交流团队,共计5人,以及多件传统工艺品,前往杭州的一个文化展览中心。当天,他们参加了一场文化交流展览。陈刚和他的团队展示了他们精心制作的传统手工艺品,并现场演示了制作过程。他们的展品吸引了大量参观者的关注,并获得了文化界人士的高度评价。展览结束后,他们还与其他文化工作者进行了交流,探讨了传统文化的传承和创新。"\nkeyphrase: "杭州的文化展览中心"\nquestion: \n', 'role': 'user'}], 'model': 'phi3.5:latest', 'n': 1, 'stream': False, 'temperature': 0}}
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.failed exception=RuntimeError('Event loop is closed')
DEBUG:openai._base_client:Encountered Exception
Traceback (most recent call last):
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/_base_client.py", line 1564, in _request
    response = await self._client.send(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1674, in send
    response = await self._send_handling_auth(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1776, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
    raise exc from None
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 189, in handle_async_request
    await self._close_connections(closing)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections
    await connection.aclose()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection.py", line 171, in aclose
    await self._connection.aclose()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/http11.py", line 265, in aclose
    await self._network_stream.aclose()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 55, in aclose
    await self._stream.aclose()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1258, in aclose
    self._transport.close()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/selector_events.py", line 706, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
DEBUG:openai._base_client:Raising connection error
Generating:   0%|          | 0/10 [00:29<?, ?it/s]
Traceback (most recent call last):
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/_base_client.py", line 1564, in _request
    response = await self._client.send(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1674, in send
    response = await self._send_handling_auth(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_client.py", line 1776, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
    raise exc from None
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 189, in handle_async_request
    await self._close_connections(closing)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections
    await connection.aclose()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/connection.py", line 171, in aclose
    await self._connection.aclose()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_async/http11.py", line 265, in aclose
    await self._network_stream.aclose()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 55, in aclose
    await self._stream.aclose()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1258, in aclose
    self._transport.close()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/selector_events.py", line 706, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/evilkylin/Projects/starag-eva/gen.py", line 44, in <module>
    testset: TestDataset = generator.generate_with_langchain_docs(documents,
  File "/Users/evilkylin/Projects/ragas/src/ragas/testset/generator.py", line 210, in generate_with_langchain_docs
    return self.generate(
  File "/Users/evilkylin/Projects/ragas/src/ragas/_analytics.py", line 129, in wrapper
    result = func(*args, **kwargs)
  File "/Users/evilkylin/Projects/ragas/src/ragas/testset/generator.py", line 304, in generate
    test_data_rows = exec.results()
  File "/Users/evilkylin/Projects/ragas/src/ragas/executor.py", line 118, in results
    results = asyncio.run(_aresults())
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/evilkylin/Projects/ragas/src/ragas/executor.py", line 113, in _aresults
    r = await future
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/Users/evilkylin/Projects/ragas/src/ragas/executor.py", line 34, in sema_coro
    return await coro
  File "/Users/evilkylin/Projects/ragas/src/ragas/executor.py", line 61, in wrapped_callable_async
    raise e
  File "/Users/evilkylin/Projects/ragas/src/ragas/executor.py", line 55, in wrapped_callable_async
    result = await callable(*args, **kwargs)
  File "/Users/evilkylin/Projects/ragas/src/ragas/testset/evolutions.py", line 143, in evolve
    ) = await self._aevolve(current_tries, current_nodes)
  File "/Users/evilkylin/Projects/ragas/src/ragas/testset/evolutions.py", line 467, in _aevolve
    simple_question, current_nodes, _ = await self.se._aevolve(
  File "/Users/evilkylin/Projects/ragas/src/ragas/testset/evolutions.py", line 304, in _aevolve
    results = await self.generator_llm.generate(
  File "/Users/evilkylin/Projects/ragas/src/ragas/llms/base.py", line 96, in generate
    return await agenerate_text_with_retry(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 189, in async_wrapped
    return await copy(fn, *args, **kwargs)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 111, in __call__
    do = await self.iter(retry_state=retry_state)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
    result = await action(retry_state)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner
    return call(*args, **kwargs)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 114, in __call__
    result = await fn(*args, **kwargs)
  File "/Users/evilkylin/Projects/ragas/src/ragas/llms/base.py", line 178, in agenerate_text
    return await self.langchain_llm.agenerate_prompt(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 787, in agenerate_prompt
    return await self.agenerate(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 747, in agenerate
    raise exceptions[0]
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 923, in _agenerate_with_cache
    result = await self._agenerate(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/langchain_openai/chat_models/base.py", line 843, in _agenerate
    response = await self.async_client.create(**payload)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1412, in create
    return await self._post(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/_base_client.py", line 1831, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/_base_client.py", line 1525, in request
    return await self._request(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/openai/_base_client.py", line 1598, in _request
    raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

Expected behavior Should be able to generate testdataset.

Additional context Add any other context about the problem here.

KylinMountain avatar Sep 30 '24 03:09 KylinMountain

@KylinMountain Hello there! I'm here to help you with any bugs, questions, or contributions you may have. Let's get to the bottom of this together!

I found some similar issues that might help you resolve the "Event loop is closed" error and connection issues:

  1. Open Issue: How can I set config parameters for test data generation

    • Solution: Configure parameters like max_workers, rate_limits, and timeouts to avoid connection timeout errors. Example:
    from ragas.testset.generator import TestsetGenerator
    from ragas.testset.evolutions import simple, reasoning, multi_context
    from langchain_openai import ChatOpenAI, OpenAIEmbeddings
    from ragas.run_config import RunConfig
    
    generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
    critic_llm = ChatOpenAI(model="gpt-4")
    embeddings = OpenAIEmbeddings()
    
    run_config = RunConfig(
        max_workers=5,
        rate_limits={"requests_per_minute": 60},
        timeouts={"connect_timeout": 10, "read_timeout": 30}
    )
    
    generator = TestsetGenerator.from_langchain(
        generator_llm=generator_llm,
        critic_llm=critic_llm,
        embeddings=embeddings,
        run_config=run_config
    )
    
    testset = generator.generate_with_langchain_docs(
        documents=documents,
        test_size=10,
        distributions={simple: 0.5, reasoning: 0.3, multi_context: 0.2},
        with_debugging_logs=False,
        is_async=True,
        raise_exceptions=False,
        run_config=run_config
    )
    
  2. Closed Issue: Given code @ https://github.com/explodinggradients/ragas is not working and raises an exception

    • Solution: Use the nest_asyncio library to allow nested use of asyncio.run(). Example:
    import nest_asyncio
    nest_asyncio.apply()
    
  3. Closed Issue: Error coming in example code for Answer Correctness

    • Solution: Set the is_async parameter to False when calling the evaluate function to avoid conflicts with the existing event loop. Example:
    score = evaluate(dataset, metrics=[answer_correctness], is_async=False)
    

To resolve the "Event loop is closed" error in your specific case, you can modify your code to ensure the event loop is properly managed. Here is an example:

import asyncio
from ragas.testset.generator import TestsetGenerator

async def main():
    generator_llm = ChatOpenAI(model_name="phi3.5:latest", openai_api_base="http://localhost:11434/v1", temperature=0.7)
    critic_llm = ChatOpenAI(model_name="phi3.5:latest", openai_api_base="http://localhost:11434/v1", temperature=0.7)
    hf_embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-zh-v1.5", model_kwargs={"device": "mps"})

    generator = TestsetGenerator.from_langchain(
        generator_llm,
        critic_llm,
        hf_embeddings,
        run_config=RunConfig(max_workers=1, seed=42, max_retries=3, log_tenacity=True, timeout=180),
    )

    testset = await generator.generate_with_langchain_docs(
        documents=documents,
        test_size=10,
        distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},
        run_config=RunConfig(max_workers=1, seed=42, max_retries=1, log_tenacity=True),
        with_debugging_logs=True,
        is_async=True
    )

    ds = testset.to_dataset()
    ds.save_to_disk("./activity_testset")

if __name__ == "__main__":
    asyncio.run(main())

This should help manage the event loop correctly and avoid the "Event loop is closed" error [1][2][3][4].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Sep 30 '24 03:09 dosubot[bot]

@shahules786 Do you know why? I am always getting this error, it's upset.

KylinMountain avatar Sep 30 '24 03:09 KylinMountain

hey @KylinMountain that is a bummer. We have a new version of testset generation that is out - will you consider trying that out?

you will have to install it from source though

jjmachan avatar Sep 30 '24 16:09 jjmachan

@jjmachan I would like to try, is there any instructions how to use new version? Is it same with current api? I’ve already install from source code

KylinMountain avatar Oct 01 '24 02:10 KylinMountain

import nest_asyncio nest_asyncio.apply()

Adding this helped in my case.

Niggl0n avatar Oct 04 '24 12:10 Niggl0n

can confirm import nest_asyncio nest_asyncio.apply() solved my issue

bdytx5 avatar Oct 15 '24 08:10 bdytx5

@KylinMountain please do the following

  1. upgrade to ragas 0.2
  2. import nest_asyncio nest_asyncio.apply()
  3. follow guide here

shahules786 avatar Oct 15 '24 13:10 shahules786

pip install -U ragas langchain pip install unstructured brew install libmagic pip install --upgrade nltk

https://github.com/bdytx5/ragas_tutorial/blob/main/demo_data_gen.py

Here's an updated script

bdytx5 avatar Oct 16 '24 02:10 bdytx5

@KylinMountain please do the following

  1. upgrade to ragas 0.2
  2. import nest_asyncio nest_asyncio.apply()
  3. follow guide here

Thanks for your update, let me have a try.

KylinMountain avatar Oct 16 '24 03:10 KylinMountain

@shahules786 looks like there must some extra setting when using the other llm model compatible with open ai sdk.

KylinMountain avatar Oct 16 '24 03:10 KylinMountain

@KylinMountain Can you share more details?

shahules786 avatar Oct 16 '24 03:10 shahules786

@shahules786 yeah. I have to set transforms

import nest_asyncio
from langchain_community.document_loaders import DirectoryLoader
from langchain_openai import ChatOpenAI
from ragas.embeddings import HuggingfaceEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.testset import TestsetGenerator
from ragas.testset.synthesizers import AbstractQuerySynthesizer, ComparativeAbstractQuerySynthesizer, \
    SpecificQuerySynthesizer
from ragas.testset.transforms import Parallel, SummaryCosineSimilarityBuilder, CosineSimilarityBuilder, \
    EmbeddingExtractor, HeadlineSplitter, HeadlinesExtractor, TitleExtractor, KeyphrasesExtractor, SummaryExtractor, \
    Transforms

nest_asyncio.apply()

loader = DirectoryLoader("/Users/evilkylin/Projects/graphrag/data/proj-xz/input")
documents = loader.load()

for document in documents:
    document.metadata['filename'] = document.metadata['source']

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="deepseek-chat", openai_api_base="https://api.deepseek.com/v1",
                                               openai_api_key="xxx", n=1))


hf_embeddings = HuggingfaceEmbeddings(model_name="BAAI/bge-base-zh-v1.5", model_kwargs={"device": "mps"})

generator = TestsetGenerator(llm=generator_llm)

query_distribution = [
    (AbstractQuerySynthesizer(llm=generator_llm), 0.25),
    (ComparativeAbstractQuerySynthesizer(llm=generator_llm), 0.25),
    (SpecificQuerySynthesizer(llm=generator_llm), 0.5),
]


def default_transforms() -> Transforms:
    from ragas.testset.graph import NodeType

    # define the transforms
    summary_extractor = SummaryExtractor(llm=generator_llm)
    keyphrase_extractor = KeyphrasesExtractor(llm=generator_llm)
    title_extractor = TitleExtractor(llm=generator_llm)
    headline_extractor = HeadlinesExtractor(llm=generator_llm)
    embedding_extractor = EmbeddingExtractor(embedding_model=hf_embeddings)
    headline_splitter = HeadlineSplitter()
    cosine_sim_builder = CosineSimilarityBuilder(threshold=0.8)
    summary_embedder = EmbeddingExtractor(
        name="summary_embedder",
        property_name="summary_embedding",
        embed_property_name="summary",
        filter_nodes=lambda node: True if node.type == NodeType.DOCUMENT else False,
        embedding_model=hf_embeddings,
    )
    summary_cosine_sim_builder = SummaryCosineSimilarityBuilder(threshold=0.6)

    # specify the transforms and their order to be applied
    transforms = [
        Parallel(summary_extractor, headline_extractor),
        summary_embedder,
        headline_splitter,
        Parallel(embedding_extractor, keyphrase_extractor, title_extractor),
        cosine_sim_builder,
        summary_cosine_sim_builder,
    ]
    return transforms


dataset = generator.generate_with_langchain_docs(documents,
                                                 testset_size=10,
                                                 transforms=default_transforms(),
                                                 query_distribution=query_distribution,
                                                 raise_exceptions=True,
                                                 with_debugging_logs=True
                                                 )

ds = dataset.to_dataset()
ds.save_to_disk("./activity_testset")

this is my code, I have copied the default transforms from source code and set llm to these transforms because the default transforms such as SummaryExtractor.

summary_extractor = SummaryExtractor()
@dataclass
class SummaryExtractor(LLMBasedExtractor):
    """
    Extracts a summary from the given text.

    Attributes
    ----------
    property_name : str
        The name of the property to extract.
    prompt : SummaryExtractorPrompt
        The prompt used for extraction.
    """

    property_name: str = "summary"
    prompt: SummaryExtractorPrompt = SummaryExtractorPrompt()
    
 @dataclass
class LLMBasedExtractor(Extractor, PromptMixin):
    llm: BaseRagasLLM = field(default_factory=llm_factory)
    merge_if_possible: bool = True

def llm_factory(
    model: str = "gpt-4o-mini",
    run_config: t.Optional[RunConfig] = None,
    default_headers: t.Optional[t.Dict[str, str]] = None,
    base_url: t.Optional[str] = None,
) -> BaseRagasLLM:

you can see these transforms if you don't set llm to them , it will use llm_factory with default params that is OpenAI.

KylinMountain avatar Oct 16 '24 03:10 KylinMountain

I think we should deliver llm and embedding to the inner default_transforms and initialized these transforms like this

def default_transforms(llm: BaseRagasLLM, embedding: BaseRagasEmbeddings) -> Transforms:
    """
    Creates and returns a default set of transforms for processing a knowledge graph.

    This function defines a series of transformation steps to be applied to a
    knowledge graph, including extracting summaries, keyphrases, titles,
    headlines, and embeddings, as well as building similarity relationships
    between nodes.

    The transforms are applied in the following order:
    1. Parallel extraction of summaries and headlines
    2. Embedding of summaries for document nodes
    3. Splitting of headlines
    4. Parallel extraction of embeddings, keyphrases, and titles
    5. Building cosine similarity relationships between nodes
    6. Building cosine similarity relationships between summaries

    Returns
    -------
    Transforms
        A list of transformation steps to be applied to the knowledge graph.

    """
    from ragas.testset.graph import NodeType

    # define the transforms
    summary_extractor = SummaryExtractor(llm=llm)
    keyphrase_extractor = KeyphrasesExtractor(llm=llm)
    title_extractor = TitleExtractor(llm=llm)
    headline_extractor = HeadlinesExtractor(llm=llm)
    embedding_extractor = EmbeddingExtractor(embedding_model=embedding)
    headline_splitter = HeadlineSplitter()
    cosine_sim_builder = CosineSimilarityBuilder(threshold=0.8)
    summary_embedder = EmbeddingExtractor(
        name="summary_embedder",
        property_name="summary_embedding",
        embed_property_name="summary",
        filter_nodes=lambda node: True if node.type == NodeType.DOCUMENT else False,
        embedding_model=hf_embeddings,
    )
    summary_cosine_sim_builder = SummaryCosineSimilarityBuilder(threshold=0.6)

    # specify the transforms and their order to be applied
    transforms = [
        Parallel(summary_extractor, headline_extractor),
        summary_embedder,
        headline_splitter,
        Parallel(embedding_extractor, keyphrase_extractor, title_extractor),
        cosine_sim_builder,
        summary_cosine_sim_builder,
    ]
    return transforms

KylinMountain avatar Oct 16 '24 03:10 KylinMountain

Even if I have set these transforms, it still report errors like ValueError: no clusters found. Try running a few transforms to populate the dataset.

/Users/evilkylin/Projects/miniconda3/envs/starag-eva/bin/python /Users/evilkylin/Projects/starag-eva/gen.py 
Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating common_concepts:  50%|█████     | 1/2 [00:05<00:05,  5.11s/it]
Generating common_concepts: 100%|██████████| 2/2 [00:06<00:00,  2.80s/it]
                                                                         Traceback (most recent call last):
  File "/Users/evilkylin/Projects/starag-eva/gen.py", line 91, in <module>
    dataset = generator.generate_with_langchain_docs(documents,
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/generate.py", line 92, in generate_with_langchain_docs
    return self.generate(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/generate.py", line 186, in generate
    scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 146, in results
    results = asyncio.run(_aresults())
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 141, in _aresults
    r = await future
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 36, in sema_coro
    return await coro
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 81, in wrapped_callable_async
    raise e
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 78, in wrapped_callable_async
    result = await callable(*args, **kwargs)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/base.py", line 89, in generate_scenarios
    scenarios = await self._generate_scenarios(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/specific_query.py", line 61, in _generate_scenarios
    sampled_nodes = random.choices(nodes, k=n)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/random.py", line 519, in choices
    return [population[floor(random() * n)] for i in _repeat(None, k)]
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/random.py", line 519, in <listcomp>
    return [population[floor(random() * n)] for i in _repeat(None, k)]
IndexError: list index out of range
Task exception was never retrieved
future: <Task finished name='Task-285' coro=<as_completed.<locals>.sema_coro() done, defined at /Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py:34> exception=ValueError('no clusters found. Try running a few transforms to populate the dataset')>
Traceback (most recent call last):
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 36, in sema_coro
    return await coro
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 81, in wrapped_callable_async
    raise e
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/executor.py", line 78, in wrapped_callable_async
    result = await callable(*args, **kwargs)
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/base.py", line 89, in generate_scenarios
    scenarios = await self._generate_scenarios(
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/abstract_query.py", line 96, in _generate_scenarios
    raise ValueError(
ValueError: no clusters found. Try running a few transforms to populate the dataset

Process finished with exit code 1

KylinMountain avatar Oct 16 '24 03:10 KylinMountain

The error about ValueError: no clusters found. Try running a few transforms to populate the dataset, I think it is due to my input file is txt with several hundred tokens, which is not able to generate dataset. you can try it.

When I changed the input source to a pdf file, it is ok to generate.

@shahules786 Would you fix such issues?

KylinMountain avatar Oct 16 '24 04:10 KylinMountain

Another issue, it can't save the generate dataset, NameError: name 'EvaluationDataset' is not defined.

Too much issues....this is really upset.

dataset.to_pandas()
Traceback (most recent call last):
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/dataset_schema.py", line 197, in to_pandas
    data = self._to_list()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/testset_schema.py", line 52, in _to_list
    eval_list = self.to_evaluation_dataset()._to_list()
  File "/Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/testset_schema.py", line 47, in to_evaluation_dataset
    return EvaluationDataset(
NameError: name 'EvaluationDataset' is not defined

KylinMountain avatar Oct 16 '24 06:10 KylinMountain

Just comment the if statement here:

#if t.TYPE_CHECKING: from ragas.dataset_schema import ( EvaluationDataset, MultiTurnSample, SingleTurnSample, )

`

comment the if statement and reverse tab code underneath. -- the file is in should be in /Users/evilkylin/Projects/miniconda3/envs/starag-eva/lib/python3.10/site-packages/ragas/testset/synthesizers/testset_schema.py

bdytx5 avatar Oct 16 '24 07:10 bdytx5

@shahules786 this is a pretty big issue

bdytx5 avatar Oct 16 '24 07:10 bdytx5

@KylinMountain @bdytx5 I'm working on the fix for that - a patch release will be out in a few hours

jjmachan avatar Oct 16 '24 08:10 jjmachan

@KylinMountain a patch release is out

The error about ValueError: no clusters found. Try running a few transforms to populate the dataset, I think it is due to my input file is txt with several hundred tokens, which is not able to generate dataset. you can try it.

When I changed the input source to a pdf file, it is ok to generate.

this is because the knowledge graph might not have enough nodes to find the clusters. For small documents SpecificQuerySynthesizer must be enough

jjmachan avatar Oct 16 '24 09:10 jjmachan