ragas Error 'openai.APIConnectionError: Connection error.' occurs when running generator.generate_with_langchain

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug Error 'openai.APIConnectionError: Connection error.' occurs when running generator.generate_with_langchain_docs().

Ragas version: 0.1.20 Python version: 3.11.9 System: Windows 10 Develop: Visual Studio Code

Code to Reproduce

loader = DirectoryLoader("D:\\xxxxxx\\test_doc")
documents = loader.load()
generator_llm = ChatOpenAI(model='gpt-3.5-turbo-16k')
critic_llm = ChatOpenAI(model='gpt-4')
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

Error trace Generating: 0%| | 0/10 [00:01<?, ?it/s] Traceback (most recent call last): ... File "D:\anaconda3\envs\next_console_qa_evaluate\Lib\asyncio\proactor_events.py", line 401, in _loop_writing self._write_fut = self._loop._proactor.send(self._sock, data) AttributeError: 'NoneType' object has no attribute 'send' During handling of the above exception, another exception occurred: ... File "D:\anaconda3\envs\next_console_qa_evaluate\Lib\asyncio\base_events.py", line 520, in _check_closed raise RuntimeError('Event loop is closed') RuntimeError: Event loop is closed During handling of the above exception, another exception occurred: ... File "D:\anaconda3\envs\next_console_qa_evaluate\Lib\site-packages\openai_base_client.py", line 1598, in _request raise APIConnectionError(request=request) from err openai.APIConnectionError: Connection error.

Expected behavior Run successfully and get the 10 generated results.

Additional context OpenAI connestion is ok. I can get the completions from OpenAI API with normal requests.

Oct 06 '24 01:10 francescofan

The error you're encountering, openai.APIConnectionError: Connection error., when using generate_with_langchain_docs() in the Ragas library, might be related to how asynchronous operations and event loops are managed in your environment.

Here are some steps and considerations that might help resolve the issue:

Check Event Loop Management: The error trace indicates issues with the event loop, specifically RuntimeError: Event loop is closed. This can occur if the event loop is not properly managed or if there are conflicts with existing loops. Ensure that your code correctly handles asynchronous operations. The Ragas library uses asyncio for managing asynchronous tasks, and it checks for existing event loops. If you're running this in an environment like Jupyter or VS Code, consider using nest_asyncio to allow nested event loops.
Use nest_asyncio: If you're in an interactive environment, try installing and applying nest_asyncio to allow the existing event loop to run new asynchronous tasks:
```
pip install nest_asyncio
```
Then, in your code, apply it before running asynchronous tasks:
```
import nest_asyncio
nest_asyncio.apply()
```
Connection Stability: Ensure that your network connection is stable. The Ragas library configures the OpenAI API client with settings that could affect connection stability, such as timeouts. Check if the timeout setting in your RunConfig is appropriate for your network conditions.
Check OpenAI API Configuration: Verify that your OpenAI API client is correctly configured. The Ragas library uses LangchainLLMWrapper to manage the OpenAI client, and settings like timeout and base_url can impact connection stability. Ensure these are set correctly.
Retry Mechanisms: If the issue persists, consider implementing retry mechanisms to handle transient connection errors. This can be done by catching the APIConnectionError and retrying the request after a short delay.

By addressing these areas, you should be able to mitigate the connection error and successfully generate the test set.

To continue talking to Dosu, mention @dosu.

Oct 06 '24 01:10 dosubot[bot]

5. If the issue persists, consider implementing retry mechanisms to handle transient connection errors. This can be done by catching the APIConnectionError and retrying the request after a short delay.

Hi, I encountered the same error as you. Have you solved it now?

Oct 11 '24 02:10 Root970103

Same here. I don't think it is a firewall/network issue on my network, because the OpenAI embeddings are successfully created. It is only the OpenAI LLM access that is problematic. I also tried different (i.e., less popular, hopefully less congested) models, and I increased the timeouts and even attempted to use the non-async version, all with no luck.

generator_llm = ChatOpenAI(model="gpt-4-turbo", max_tokens=1024, timeout=30, max_retries=10, verbose=True)
critic_llm = ChatOpenAI(model="gpt-3.5-turbo", max_tokens=1024, timeout=30, max_retries=10, verbose=True)
embeddings = OpenAIEmbeddings(model="text-embedding-3-large", dimensions=1024)

generator = TestsetGenerator.from_langchain(generator_llm, critic_llm, embeddings)

# Change resulting question type distribution
distributions = {simple: 0.5, multi_context: 0.4, reasoning: 0.1}

testset = generator.generate_with_langchain_docs(docs, 10, distributions, with_debugging_logs=True, is_async=False)

Producing:

...
  File "/myproject/venvs/3.11-n/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 1490, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/myproject/venvs/3.11-n/lib/python3.11/site-packages/openai/_base_client.py", line 1838, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproject/venvs/3.11-n/lib/python3.11/site-packages/openai/_base_client.py", line 1532, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/myproject/venvs/3.11-n/lib/python3.11/site-packages/openai/_base_client.py", line 1605, in _request
    raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

It's a little frustrating because I'm paying for the embeddings each time, only to fail on the first LLM call. Maybe the "getting started" example needs some work to separate the two?

Oct 11 '24 13:10 trevorbowen

@trevorbowen I saw a possible solution in another issue.

import nest_asyncio
nest_asyncio.apply()

Add the above code. Although I don't know the principle, it dose help me to avoid the above issue. Hope it can help you too. :)

Oct 12 '24 07:10 Root970103

I had the same issue.

I tried to solve by adding the following code, but it didn't solve.

import nest_asyncio
nest_asyncio.apply()

Then I tried by increasing those attributes for the embeddings object

# embeddings is your embeddings object
embeddings.max_retries=10
embeddings.timeout=50

and even by setting is_async=False, but the error still appears.

Oct 12 '24 15:10 magni5

I don't know what exactly fixed the issue, but this setup worked without any expectations being raised. Replace the uppercase variables with your proper objects

import asyncio 

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import (
    simple, reasoning, multi_context, conditional
)

from ragas.run_config import RunConfig


async def generate_testset():
    run_config = RunConfig(
        timeout=60*30, # 30 minutes
        seed=42,
        max_retries=30,
        max_wait=60*30, # 30 minutes
        max_workers=1
    )

    EMBEDDINGS.max_retries = 20
    EMBEDDINGS.request_timeout = 1000

    generator = TestsetGenerator.from_langchain(
        QUESTIONS_GENERATOR_LLM, 
        ANSWERS_GENERATOR_LLM, 
        EMBEDDINGS,
        run_config=run_config
    )
        
    testset = generator.generate_with_langchain_docs(
        DOCS,
        test_size=20, 
        distributions={
            simple: 0.25, 
            reasoning: 0.25, 
            multi_context: 0.25,
            conditional: 0.25
        },
        raise_exceptions=False,
        is_async=False,
        with_debugging_logs=True,
        run_config=run_config
    )

    # ...

asyncio.run(generate_testset())

My dependencies are

ragas==0.1.21
langchain==0.2.16
langchain-openai==0.1.25

Oct 13 '24 19:10 magni5

I don't know what exactly fixed the issue, but this setup worked without any expectations being raised. Replace the uppercase variables with your proper objects

import asyncio 

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import (
    simple, reasoning, multi_context, conditional
)

from ragas.run_config import RunConfig


async def generate_testset():
    run_config = RunConfig(
        timeout=60*30, # 30 minutes
        seed=42,
        max_retries=30,
        max_wait=60*30, # 30 minutes
        max_workers=1
    )

    EMBEDDINGS.max_retries = 20
    EMBEDDINGS.request_timeout = 1000

    generator = TestsetGenerator.from_langchain(
        QUESTIONS_GENERATOR_LLM, 
        ANSWERS_GENERATOR_LLM, 
        EMBEDDINGS,
        run_config=run_config
    )
        
    testset = generator.generate_with_langchain_docs(
        DOCS,
        test_size=20, 
        distributions={
            simple: 0.25, 
            reasoning: 0.25, 
            multi_context: 0.25,
            conditional: 0.25
        },
        raise_exceptions=False,
        is_async=False,
        with_debugging_logs=True,
        run_config=run_config
    )

    # ...

asyncio.run(generate_testset())

My dependencies are

ragas==0.1.21
langchain==0.2.16
langchain-openai==0.1.25

@magni5 Hello, can you provide the code for reading files? I ran your version and got the following problem

Filename and doc_id are the same for all nodes.
Generating: 0%| | 0/20 [00:00<?, ?it/s]Exception raised in Job[11]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[12]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[13]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[14]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[15]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[16]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[0]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[17]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[18]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[1]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[19]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[2]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[9]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[3]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[4]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[5]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[10]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[6]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[7]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[8]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt')

Oct 15 '24 05:10 Well-xu

I don't know what exactly fixed the issue, but this setup worked without any expectations being raised. Replace the uppercase variables with your proper objects
import asyncio 

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import (
    simple, reasoning, multi_context, conditional
)

from ragas.run_config import RunConfig


async def generate_testset():
    run_config = RunConfig(
        timeout=60*30, # 30 minutes
        seed=42,
        max_retries=30,
        max_wait=60*30, # 30 minutes
        max_workers=1
    )

    EMBEDDINGS.max_retries = 20
    EMBEDDINGS.request_timeout = 1000

    generator = TestsetGenerator.from_langchain(
        QUESTIONS_GENERATOR_LLM, 
        ANSWERS_GENERATOR_LLM, 
        EMBEDDINGS,
        run_config=run_config
    )
        
    testset = generator.generate_with_langchain_docs(
        DOCS,
        test_size=20, 
        distributions={
            simple: 0.25, 
            reasoning: 0.25, 
            multi_context: 0.25,
            conditional: 0.25
        },
        raise_exceptions=False,
        is_async=False,
        with_debugging_logs=True,
        run_config=run_config
    )

    # ...

asyncio.run(generate_testset())
My dependencies are
ragas==0.1.21
langchain==0.2.16
langchain-openai==0.1.25
@magni5 Hello, can you provide the code for reading files? I ran your version and got the following problem

Filename and doc_id are the same for all nodes. Generating: 0%| | 0/20 [00:00<?, ?it/s]Exception raised in Job[11]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[12]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[13]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[14]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[15]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[16]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[0]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[17]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[18]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[1]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[19]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[2]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[9]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[3]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[4]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[5]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[10]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[6]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[7]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt') Exception raised in Job[8]: AttributeError('LangchainLLMWrapper' object has no attribute 'agenerate_prompt')

I generate my DOCS in this way:

path = 'the_path_to_your_file.pdf_or_txt'

if path.endswith('.pdf'):
    loader = PyPDFLoader(path)
elif path.endswith('.txt'):
    loader = TextLoader(path)
else:
    raise ValueError(f'Unsupported file type: {path}')

documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=CHUNK_SIZE, 
    chunk_overlap=CHUNK_OVERLAP
)
DOCS = text_splitter.split_documents(documents)

i = 0
for i in range(len(DOCS)):
    doc = DOCS[i]
    doc.metadata['title'] = f'{path.split('/')[-1]}_{i}'
    doc.metadata['filename'] = doc.metadata['source']

Oct 15 '24 09:10 magni5