opik [FR]: "Run/Evaluate - Compare - Edit Prompt" UI loop

Proposal summary

A way to start a python script, that waits until the UI triggers an experiment. Python Opik() client wait/callback function to start experiment from UI, after editing the prompt. i.e client.wait_on(my_evalutation_fn) or Evaluation class with a run function.

Possible place for the ui in the experiments page. A run button and a change prompt button.

Motivation

The Run/Evaluate then Compare with previous runs, then change the prompt loop is the core of the process. There is no way to trigger this loop from the UI

Dec 11 '24 18:12 StoyanStAtanasov

We are actually planning on building functionality to run the entire evaluation through the UI which I think might be an even nicer experience.

However this to achieve this, we are planning on support prompt templates being evaluated against a dataset and scored using either one of Opik's metrics or a custom one. Would that work in your use case or does your evaluation task quite complex and not just a single LLM call ?

Dec 11 '24 19:12 jverre

Your plan will work for a lot of cases and that's how we started but now we are looking into more complicated scenarios involving DB queries and multiple LLM calls. You could do both. My proposal is simple to implement. The code will wait and poll (HTTP) or wait for a message (Websocket) and then execute a callback. Of course there are many ways to implement LLM application functionality, but it will take time, in the meantime you could just augment your API and have a UI trigger.

Dec 12 '24 12:12 StoyanStAtanasov

Good point @StoyanStAtanasov

@alexkuzmik This could be a nice idea, what do you think ?

Dec 12 '24 20:12 jverre