[AGE-370] Improve reproducibility of AI critique outputs

Open mmabrouk opened this issue 1 year ago • 1 comments

AI critique provides different results from run to run. The goal of this issue is to determine and implement the best practices / parameters for running AI critic and improving its reliability.

The first step is to determine the best practices in other oss libraries / literature

_{From SyncLinear.com | AGE-370}

Jun 28 '24 11:06 mmabrouk

I looked into how it is done in Ragas. In the default mode, they set the temperature to 1e-8 To increase the reproducibility (for example in CI), they increase the temperature to 0.3 and run each call three times.

Jun 28 '24 11:06 mmabrouk

This has been resolved

Nov 20 '24 10:11 mmabrouk