Steven Hillion

Results 3 comments of Steven Hillion

Thanks David. Let's leave this open until we've decided what to do with (1) auto-populating the test spreadsheet via a DAG, and (2) evaluating with a secondary LLM. We'll discuss...

Discussed with @davidgxue — let's consider using another LLM to evaluate the quality of responses, and then add that to the DAG so that we can have it run regularly...

@sunank200 — is this issue still relevant?