Steven Hillion
Results
3
comments of
Steven Hillion
Thanks David. Let's leave this open until we've decided what to do with (1) auto-populating the test spreadsheet via a DAG, and (2) evaluating with a secondary LLM. We'll discuss...
Discussed with @davidgxue — let's consider using another LLM to evaluate the quality of responses, and then add that to the DAG so that we can have it run regularly...
@sunank200 — is this issue still relevant?