🗺 prompt playground
As a user of Phoenix I don't want to have to go back to my IDE to iterate on a prompt. I want to be able to use the data stored in Phoenix (spans, dataets) and run them through a prompt.
Use-cases
-
Replay a template change on an LLM Span
-
Run a template change on a dataset
-
Construct an evaluation template on a single chosen production span or Dataset - Workflow is testing your Evals and be able to save as experiment
-
Syntetic data Generation - Use to generate synthetic data, add columns to current rows of data in a dataset, to help create test data
-
[ ] [prompts] prompt replay on LLM Spans
Planning
- [ ] #4658
- [ ] #4600
UI
- [x] #4606
- [ ] [playground][ui] credential storage
- [ ] #4803
- [ ] #4857
- [ ] [playground][ui] edit model (e.g. model selector)
- [ ] [playground][ui] mustache template editor for codemirror
- [ ] [playground][ui] f-string vs mustache template formatting toggle
- [ ] [playground][ui] f-string variable parsing utility function
- [ ] [playground][ui] mustache template variable parsing utility function
- [ ] [playground][ui] template variables (e.g. inputs) UI and storage
- [ ] [playground][ui] message re-ordering via drag and drop
- [ ] [playground][ui] add new message UI
API
- [ ] #4773
- [ ] #4774
- [ ] #4858
- [ ] #4859
- [ ] #4793
- [ ] #4804
- [ ] #4856
- [ ] [playground][gql] messages input to chat completion
- [ ] [playground][gql] model input to chat completion
- [ ] [playground][gql] tool definition input to chat completion
Instrumentation
- [ ] [playground][instrumentation] semantic convention for output schema definition (JSON mode)
- [ ] [playground][instrumentation] openai support for output schema
- [ ] [playground][instrumentation] openai node support for output schema
Hi!
Enhancement proposal
This feature should be similar to #2462 but with more depth. It would involve a simple button to replicate a query into an edit mode, allowing you to replay it. Additionally, it should offer the possibility to add notes on the result iterations, such as rating the quality, format output, etc., on a scale of 1 to 10.
Goal
The goal is to facilitate quick testing of prompts and inputs, enabling evaluation and visualization of progression.
Thank you,
Alexandre
Hi!
Enhancement proposal
This feature should be similar to #2462 but with more depth. It would involve a simple button to replicate a query into an edit mode, allowing you to replay it. Additionally, it should offer the possibility to add notes on the result iterations, such as rating the quality, format output, etc., on a scale of 1 to 10.
Goal
The goal is to facilitate quick testing of prompts and inputs, enabling evaluation and visualization of progression.
Thank you,
Alexandre
@heralight Hey! Thanks for the feedback! We have a ton of features coming out with regards to prompt iteration, notably prompt experiments. Stay tuned. It has evaluations built in
Noted on the replay and the annotations:) will give it some thought. We have a few ideas around replaying data at prompts, but haven't thought about human annotations on different prom versions a ton. Would love to hear more.
Very nice! my ideal workflow, would be:
- trace some openai calls made from code
- transform some into a replayable prompt, where each modification can be tested and versioned and annoted and ranked.
- prompt can parametized and call from code
best,
🛝🎉