LangchainLLMWrapper sets a default temperature even with models that don't support it (ex o3-mini)
[x] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug When I try to create an LLM instance using LangchainLLMWrapper and specify a model that doesn't support the "temperature" parameter, I cannot get any responses from the API.
Ragas version: 0.2.13 Python version: 3.10.16
Code to Reproduce
evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model_name="o3-mini")) sample = SingleTurnSample( response="The Eiffel Tower is located in Paris.", reference="The Eiffel Tower is located in Paris. I has a height of 1000ft." )
scorer = FactualCorrectness(llm = evaluator_llm) await scorer.single_turn_ascore(sample)
Error trace openai.BadRequestError: Error code: 400 - {'error': {'message': "Unsupported parameter: 'temperature' is not supported with this model.", 'type': 'invalid_request_error', 'param': 'temperature', 'code': 'unsupported_parameter'}}
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.
I guess the solution provided won't work if using AzureChatOpenAI(from langchain_openai.chat_models import AzureChatOpenAI).
- if we pass
temperatureasNone-> AzureChatOpenAI throws error ('temperature' must be valid float) - if we remove
temperatureattributes fromLangchainLLMWrapperobject -> AzureChatOpenAI throws error (AttributeError('AzureChatOpenAI' object has no attribute 'temperature'))
Need to get a Generic solution to work well with all existing clients for reasoning models(o1, o3-mini etc.)
Quick fix - .tesla_rasa_eval/lib/python3.11/site-packages/ragas/llms/base.py change the 'get_temperature' function -
def get_temperature(self, n: int) -> float:
"""Return the temperature to use for completion based on n."""
return 0.3 if n > 1 else n
- change -> replaced
1e-8ton
and pass temperature = 0 for reasoning models(o1, o3-mini etc.)
I am encountering similar issues referring to these 2:
- https://github.com/Azure/azure-sdk-for-python/issues/39938
- https://github.com/langchain-ai/langchain/issues/30126
Maybe we could just add a conditional statement here specifically for like you suggested @mukul1609 for o1 and o3 family. There is also an article experimenting reasoning models for evaluation pipeline: https://www.reddit.com/r/Rag/comments/1ixs5wx/we_evaluated_if_reasoning_models_like_o3mini_can/?rdt=60212.