agenta
agenta copied to clipboard
[AGE-370] Improve reproducibility of AI critique outputs
AI critique provides different results from run to run. The goal of this issue is to determine and implement the best practices / parameters for running AI critic and improving its reliability.
The first step is to determine the best practices in other oss libraries / literature
From SyncLinear.com | AGE-370
I looked into how it is done in Ragas. In the default mode, they set the temperature to 1e-8 To increase the reproducibility (for example in CI), they increase the temperature to 0.3 and run each call three times.
This has been resolved