ragas icon indicating copy to clipboard operation
ragas copied to clipboard

Testset generator not working using AzureOpenai key.

Open rahul1-995 opened this issue 1 year ago • 10 comments

I am trying to generate synthetic data using azure openai api, taking long time to run and after that getting error.

Ragas version: 0.1.1 Python version: 3.10

Code to Reproduce import os os.environ["AZURE_OPENAI_API_KEY"] = "AZURE_OPENAI_API_KEY"

azure_configs_gen = { "base_url": "", "model_deployment": "gpt-35-turbo-16k", "model_name": "gpt-35-turbo-16k", "embedding_deployment": "text-embedding-ada-002", "embedding_name": "text-embedding-ada-002", }

azure_configs_critic = { "base_url": "", "model_deployment": "gpt-4", "model_name": "gpt-4", "embedding_deployment": "text-embedding-ada-002", "embedding_name": "text-embedding-ada-002", } generator_llm = AzureChatOpenAI( openai_api_version="2023-05-15", azure_endpoint=azure_configs_gen["base_url"], azure_deployment=azure_configs_gen["model_deployment"], model=azure_configs_gen["model_name"], validate_base_url=False, ) generator_llm = LangchainLLMWrapper(generator_llm)

critic_llm = AzureChatOpenAI( openai_api_version="2023-05-15", azure_endpoint=azure_configs_critic["base_url"], azure_deployment=azure_configs_critic["model_deployment"], model=azure_configs_critic["model_name"], validate_base_url=False, )

critic_llm = LangchainLLMWrapper(generator_llm)

embed_model = AzureOpenAIEmbeddings( openai_api_version="2023-05-15", azure_endpoint=azure_configs_gen["base_url"], azure_deployment=azure_configs_gen["embedding_deployment"], model=azure_configs_gen["embedding_name"], ) embed_model = LangchainEmbeddingsWrapper(embed_model)

pdf_path = r"machinelearning-lecture01.pdf" documents = SimpleDirectoryReader(input_files=[pdf_path]).load_data() #type(documents)

splitter = TokenTextSplitter(chunk_size=2000, chunk_overlap=100) keyphrase_extractor = KeyphraseExtractor(llm=generator_llm) docstore = InMemoryDocumentStore( splitter=splitter, embeddings=embed_model, extractor=keyphrase_extractor, ) from ragas.testset import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context

test_generator = TestsetGenerator( generator_llm=generator_llm, critic_llm=critic_llm, embeddings=embed_model, docstore=docstore, )

testset = test_generator.generate_with_llamaindex_docs(documents=documents[:5], test_size=3,distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

Error trace Exception in thread Thread-7: Traceback (most recent call last): File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\threading.py", line 1045, in _bootstrap_inner self.run() File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\executor.py", line 75, in run results = self.loop.run_until_complete(self._aresults()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\asyncio\base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\executor.py", line 63, in _aresults raise e File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\executor.py", line 58, in _aresults r = await future ^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\asyncio\tasks.py", line 615, in _wait_for_one return f.result() # May raise f.exception(). ^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\executor.py", line 91, in wrapped_callable_async return counter, await callable(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\testset\evolutions.py", line 150, in evolve ) = await self.aevolve(current_tries, current_nodes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\testset\evolutions.py", line 253, in aevolve passed = await self.node_filter.filter(current_nodes.root_node) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\testset\filters.py", line 54, in filter results = await self.llm.generate(prompt=prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\llms\base.py", line 92, in generate return await agenerate_text_with_retry( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\tenacity_asyncio.py", line 88, in async_wrapped return await fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\tenacity_asyncio.py", line 47, in call do = self.iter(retry_state=retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\tenacity_init.py", line 325, in iter raise retry_exc.reraise() ^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\tenacity_init.py", line 158, in reraise raise self.last_attempt.result() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\concurrent\futures_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\concurrent\futures_base.py", line 401, in __get_result raise self._exception File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\tenacity_asyncio.py", line 50, in call result = await fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\llms\base.py", line 177, in agenerate_text result = await self.langchain_llm.agenerate_prompt( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt'. Did you mean: 'agenerate_text'?

ExceptionInRunner Traceback (most recent call last) Cell In[4], line 18 9 from ragas.testset.evolutions import simple, reasoning, multi_context 11 test_generator = TestsetGenerator( 12 generator_llm=generator_llm, 13 critic_llm=critic_llm, 14 embeddings=embed_model, 15 docstore=docstore, 16 ) ---> 18 testset = test_generator.generate_with_llamaindex_docs(documents=documents[:5], 19 test_size=3,distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

File ~\Anaconda3\envs\genai\Lib\site-packages\ragas\testset\generator.py:128, in TestsetGenerator.generate_with_llamaindex_docs(self, documents, test_size, distributions, with_debugging_logs, is_async, raise_exceptions, run_config) 113 def generate_with_llamaindex_docs( 114 self, 115 documents: t.Sequence[LlamaindexDocument], (...) 122 ): 123 # chunk documents and add to docstore 124 self.docstore.add_documents( 125 [Document.from_llamaindex_document(doc) for doc in documents] 126 ) --> 128 return self.generate( 129 test_size=test_size, 130 distributions=distributions, 131 with_debugging_logs=with_debugging_logs, 132 is_async=is_async, 133 run_config=run_config, 134 raise_exceptions=raise_exceptions, 135 )

File ~\Anaconda3\envs\genai\Lib\site-packages\ragas\testset\generator.py:246, in TestsetGenerator.generate(self, test_size, distributions, with_debugging_logs, is_async, raise_exceptions, run_config) 244 test_data_rows = exec.results() 245 if test_data_rows == []: --> 246 raise ExceptionInRunner() 248 except ValueError as e: 249 raise e

ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exception=False incase you want to show only a warning message instead.

Expected behavior It should generate the test dataset from the input pdf..

Additional context Same error occur sometimes when we are using openai key instead of azure openai key.

rahul1-995 avatar Feb 20 '24 10:02 rahul1-995

Any updates or workaround over above problem?

rahul1-995 avatar Feb 21 '24 12:02 rahul1-995

Hey @rahul1-995 sorry for the late reply, are you able to make the evaluation work with azure open ai? Can you try updating langchain-core

shahules786 avatar Feb 22 '24 02:02 shahules786

Hi @shahules786 , I am not facing problem while evaluating using Azureopenai, I am facing problem with testset generation using azure, I have given code snippet above, please refer error below: AttributeError: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt'. Did you mean: 'agenerate_text'? Can you please look into this??

rahul1-995 avatar Feb 22 '24 05:02 rahul1-995

Hi Rahul can you explain how you have used the evaluation using azure openai if you haven't got the test data generated I am also facing the same problem did you generated the test data from any other method than please tell I also need to create the synthetic test data

Pranshul200 avatar Feb 22 '24 11:02 Pranshul200

@Pranshul200, I am currently using openai api key for testset generation.

rahul1-995 avatar Feb 22 '24 12:02 rahul1-995

Hey @rahul1-995 did you try updating langchain-core as requested?

shahules786 avatar Feb 26 '24 17:02 shahules786

Yes @shahules786 , I have tried updating langchain-core, still not able to run the testset generator...

rahul1-995 avatar Feb 27 '24 06:02 rahul1-995

@rahul1-995 Can you try using the version of #670 (not merged yet)?

git clone https://github.com/mspronesti/ragas/
cd ragas
pip install . 

The usage with Azure OpenAI would be

from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
import os

os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["AZURE_OPENAI_ENDPOINT"] = "..."
os.environ["OPENAI_API_VERSION"] = "2023-12-01-preview"

generator_llm = AzureChatOpenAI(deployment_name="...")
critic_llm = AzureChatOpenAI(deployment_name="...")
embeddings = AzureOpenAIEmbeddings(deployment="...")

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

mspronesti avatar Feb 28 '24 10:02 mspronesti

I'm not the original poster but I had the same problem and it disappeared in this version. Thanks :)

wikp avatar Feb 29 '24 05:02 wikp

@wikp Thanks for the confirmation!

mspronesti avatar Feb 29 '24 10:02 mspronesti

I'm not the original poster but I had the same problem and it disappeared in this version. Thanks :)

which version?

subho-das avatar Sep 17 '24 19:09 subho-das