opik
opik copied to clipboard
[FR]: Decouple trace and experiment results
Proposal summary
For manual evaluation loops it would be great if ExperimentItem results was less coupled to the trace. I need to be able to have multiple different ExperimentItems with different outputs and feedback scores for a single trace.
Basically need a "create_experiment_item" function
import opik
from opik import ExperimentItemReferences, opik_context, track
# Opik
opik_client = opik.Opik()
# Initialize the dataset objects
dataset = opik_client.get_or_create_dataset(DATASET_NAME)
experiment = opik_client.create_experiment(
name=EXPERIMENT_NAME,
dataset_name=dataset.name,
experiment_config=EXPERIMENT_CONFIG,
)
#### MAIN LOGIC ####
items = dataset.get_items()
question_to_dataset_id = {}
for item in items:
question_to_dataset_id[item.get("question")] = item.get("id")
@track(project_name=PROJECT_NAME, name=EXPERIMENT_NAME)
async def execute_task(item):
trace = opik_context.get_current_trace_data()
if trace is None:
raise ValueError("Trace ID is not set")
# Execute task
for result in range(10):
experiment.insert(
[
ExperimentItemReferences(
dataset_item_id=list(question_to_dataset_id.values())[
result
],
trace_id=trace.id,
)
]
)
# update output and feedback scores
# sets output for all...
opik_context.update_current_trace(
output={"Test": "x"},
feedback_scores=[
{
"category_name": "Test",
"name": "Test",
"reason": "Test",
"value": 1,
}
],
)
asyncio.run(execute_task(item))
Motivation usecase: Answering a questionnaire. Here I have one big overarching trace, due to the way our framework is set up, that then produces answers to different questions. Here i want and experiment entry pr. answer, but only have one overarching trace. I cannot modify the code to scope the traces better
If you have an idea for how to hack a temporary solutions then I am all ears
Relates to: #1582
Motivation
- What problem are you trying to solve?: Agentic workflow: Answering a questionnaire. Here I have one overarching trace, due to the way our framework is set up, that then produces answers to different questions. Here I want an experiment entry pr. question with outputs and feedback scores, but only have one overarching trace
- How are you currently solving this problem? Using another tool or write to excel
- What are the benefits of this feature? Allows to use Opik for our AI modules
@gustavhartz I agree that the experiment item today are too couple to a dataset and to a trace, we are planning on some big refactoring changes here in the next few weeks but it's a pretty big change to the internal data structure so will take a bit of time
Let me think about potential workarounds, might be something we can do
@jverre Is there a decorator or something that can be used on a root level to capture the trace_id of the function calls at a lower level? Would also be helpful for my usecase
import opik
from opik import ExperimentItemReferences, opik_context, track
# Opik
opik_client = opik.Opik()
# Initialize the dataset objects
dataset = opik_client.get_or_create_dataset(DATASET_NAME)
experiment = opik_client.create_experiment(
name=EXPERIMENT_NAME,
dataset_name=dataset.name,
experiment_config=EXPERIMENT_CONFIG,
)
#### MAIN LOGIC ####
items = dataset.get_items()
question_to_dataset_id = {}
for item in items:
question_to_dataset_id[item.get("question")] = item.get("id")
@capture_nested_trace
async def execute_task(item):
# Execute task WITH TRACE
# GET LAST ROOT TRACE
trace = opik_context.get_LAST_ROOT_TRACE_DATA()
if trace is None:
raise ValueError("Trace ID is not set")
for result in range(10):
experiment.insert(
[
ExperimentItemReferences(
dataset_item_id=list(question_to_dataset_id.values())[
result
],
trace_id=trace.id,
)
]
)
# update output and feedback scores
# sets output for all...
opik_context.update_current_trace(
output={"Test": "x"},
feedback_scores=[
{
"category_name": "Test",
"name": "Test",
"reason": "Test",
"value": 1,
}
],
)
asyncio.run(execute_task(item))
Hope it makes sense
@jverre Would additionally need to update a trace outside the opik_context.update_current_trace which would need to be done by through the api, but don't see the option of fixing the feedback_scores there
opik_api.traces.update_trace(
id=trace_id,
output=result,
)
EDIT: Refactored my own code to make it work, but still a good idea to decouple
@gustavhartz there is no decorator, but I suppose you can use opik_context.get_current_trace_data().id to get the current trace id.
@jverre was this ever solved?