BaseRagasLLM overriding temperature results in PydanticPrompt.generate generating identical responses due to temperature being mapped to effectively 0 when n is 1

Open aabanic opened this issue 9 months ago • 0 comments

[x] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug When using ragas to evaluate answer relevancy metric, all generated questions are identical. I have tracked this down across ResponseRelevancy._ascore -> PydanticPrompt.generate -> BaseRagasLLM.generate. The temperature gets set to effectively 0 if n = 1, but n IS ALWAYS 1 if PydanticPrompt.generate is being used.

A potential solution to this problem would be for _ascore to use .generate_multiple, instead of looping using range and using multiple OF .generate.

That still leaves me wondering, why the temperate from the LLM that gets passed to ragas is not respected, and 0.3 or 1e-8 is being used based on n?

Ragas version: 0.2.14 Python version: 3.12.3

Code to Reproduce Run ragas.evaluate using ragas.metrics.answer_relevancy as metric.

Expected behavior Generating varied questions.

Mar 23 '25 20:03 aabanic