opik [FR]: Decouple trace and experiment results

Proposal summary

For manual evaluation loops it would be great if ExperimentItem results was less coupled to the trace. I need to be able to have multiple different ExperimentItems with different outputs and feedback scores for a single trace.

Basically need a "create_experiment_item" function

import opik
from opik import ExperimentItemReferences, opik_context, track

# Opik
opik_client = opik.Opik()

# Initialize the dataset objects
dataset = opik_client.get_or_create_dataset(DATASET_NAME)
experiment = opik_client.create_experiment(
    name=EXPERIMENT_NAME,
    dataset_name=dataset.name,
    experiment_config=EXPERIMENT_CONFIG,
)

#### MAIN LOGIC ####
items = dataset.get_items()

question_to_dataset_id = {}
for item in items:
    question_to_dataset_id[item.get("question")] = item.get("id")


@track(project_name=PROJECT_NAME, name=EXPERIMENT_NAME)
async def execute_task(item):
    trace = opik_context.get_current_trace_data()
    if trace is None:
        raise ValueError("Trace ID is not set")

    # Execute task
    for result in range(10):
        experiment.insert(
            [
                ExperimentItemReferences(
                    dataset_item_id=list(question_to_dataset_id.values())[
                        result
                    ],
                    trace_id=trace.id,
                )
            ]
        )
    # update output and feedback scores
    # sets output for all...
    opik_context.update_current_trace(
        output={"Test": "x"},
        feedback_scores=[
            {
                "category_name": "Test",
                "name": "Test",
                "reason": "Test",
                "value": 1,
            }
        ],
    )

asyncio.run(execute_task(item))

Motivation usecase: Answering a questionnaire. Here I have one big overarching trace, due to the way our framework is set up, that then produces answers to different questions. Here i want and experiment entry pr. answer, but only have one overarching trace. I cannot modify the code to scope the traces better

If you have an idea for how to hack a temporary solutions then I am all ears

Relates to: #1582

Motivation

What problem are you trying to solve?: Agentic workflow: Answering a questionnaire. Here I have one overarching trace, due to the way our framework is set up, that then produces answers to different questions. Here I want an experiment entry pr. question with outputs and feedback scores, but only have one overarching trace
How are you currently solving this problem? Using another tool or write to excel
What are the benefits of this feature? Allows to use Opik for our AI modules

Mar 25 '25 12:03 gustavhartz

@gustavhartz I agree that the experiment item today are too couple to a dataset and to a trace, we are planning on some big refactoring changes here in the next few weeks but it's a pretty big change to the internal data structure so will take a bit of time

Let me think about potential workarounds, might be something we can do

Mar 25 '25 12:03 jverre

@jverre Is there a decorator or something that can be used on a root level to capture the trace_id of the function calls at a lower level? Would also be helpful for my usecase

import opik
from opik import ExperimentItemReferences, opik_context, track

# Opik
opik_client = opik.Opik()

# Initialize the dataset objects
dataset = opik_client.get_or_create_dataset(DATASET_NAME)
experiment = opik_client.create_experiment(
    name=EXPERIMENT_NAME,
    dataset_name=dataset.name,
    experiment_config=EXPERIMENT_CONFIG,
)

#### MAIN LOGIC ####
items = dataset.get_items()

question_to_dataset_id = {}
for item in items:
    question_to_dataset_id[item.get("question")] = item.get("id")

@capture_nested_trace
async def execute_task(item):

    # Execute task WITH TRACE

    # GET LAST ROOT TRACE
    trace = opik_context.get_LAST_ROOT_TRACE_DATA()
    if trace is None:
        raise ValueError("Trace ID is not set")
    for result in range(10):
        experiment.insert(
            [
                ExperimentItemReferences(
                    dataset_item_id=list(question_to_dataset_id.values())[
                        result
                    ],
                    trace_id=trace.id,
                )
            ]
        )
    # update output and feedback scores
    # sets output for all...
    opik_context.update_current_trace(
        output={"Test": "x"},
        feedback_scores=[
            {
                "category_name": "Test",
                "name": "Test",
                "reason": "Test",
                "value": 1,
            }
        ],
    )

asyncio.run(execute_task(item))

Hope it makes sense

Mar 25 '25 12:03 gustavhartz

@jverre Would additionally need to update a trace outside the opik_context.update_current_trace which would need to be done by through the api, but don't see the option of fixing the feedback_scores there

opik_api.traces.update_trace(
            id=trace_id,
            output=result,
        )

EDIT: Refactored my own code to make it work, but still a good idea to decouple

Mar 25 '25 12:03 gustavhartz

@gustavhartz there is no decorator, but I suppose you can use opik_context.get_current_trace_data().id to get the current trace id.

Mar 25 '25 14:03 alexkuzmik

@jverre was this ever solved?

Aug 08 '25 20:08 gustavhartz

opik opik copied to clipboard

[FR]: Decouple trace and experiment results

Proposal summary

Motivation

opik
opik copied to clipboard