opik icon indicating copy to clipboard operation
opik copied to clipboard

[FR]: Decouple trace and experiment results

Open gustavhartz opened this issue 8 months ago • 4 comments

Proposal summary

For manual evaluation loops it would be great if ExperimentItem results was less coupled to the trace. I need to be able to have multiple different ExperimentItems with different outputs and feedback scores for a single trace.

Basically need a "create_experiment_item" function

import opik
from opik import ExperimentItemReferences, opik_context, track

# Opik
opik_client = opik.Opik()

# Initialize the dataset objects
dataset = opik_client.get_or_create_dataset(DATASET_NAME)
experiment = opik_client.create_experiment(
    name=EXPERIMENT_NAME,
    dataset_name=dataset.name,
    experiment_config=EXPERIMENT_CONFIG,
)

#### MAIN LOGIC ####
items = dataset.get_items()

question_to_dataset_id = {}
for item in items:
    question_to_dataset_id[item.get("question")] = item.get("id")


@track(project_name=PROJECT_NAME, name=EXPERIMENT_NAME)
async def execute_task(item):
    trace = opik_context.get_current_trace_data()
    if trace is None:
        raise ValueError("Trace ID is not set")

    # Execute task
    for result in range(10):
        experiment.insert(
            [
                ExperimentItemReferences(
                    dataset_item_id=list(question_to_dataset_id.values())[
                        result
                    ],
                    trace_id=trace.id,
                )
            ]
        )
    # update output and feedback scores
    # sets output for all...
    opik_context.update_current_trace(
        output={"Test": "x"},
        feedback_scores=[
            {
                "category_name": "Test",
                "name": "Test",
                "reason": "Test",
                "value": 1,
            }
        ],
    )

asyncio.run(execute_task(item))

Motivation usecase: Answering a questionnaire. Here I have one big overarching trace, due to the way our framework is set up, that then produces answers to different questions. Here i want and experiment entry pr. answer, but only have one overarching trace. I cannot modify the code to scope the traces better

If you have an idea for how to hack a temporary solutions then I am all ears

Relates to: #1582

Motivation

  • What problem are you trying to solve?: Agentic workflow: Answering a questionnaire. Here I have one overarching trace, due to the way our framework is set up, that then produces answers to different questions. Here I want an experiment entry pr. question with outputs and feedback scores, but only have one overarching trace
  • How are you currently solving this problem? Using another tool or write to excel
  • What are the benefits of this feature? Allows to use Opik for our AI modules

gustavhartz avatar Mar 25 '25 12:03 gustavhartz

@gustavhartz I agree that the experiment item today are too couple to a dataset and to a trace, we are planning on some big refactoring changes here in the next few weeks but it's a pretty big change to the internal data structure so will take a bit of time

Let me think about potential workarounds, might be something we can do

jverre avatar Mar 25 '25 12:03 jverre

@jverre Is there a decorator or something that can be used on a root level to capture the trace_id of the function calls at a lower level? Would also be helpful for my usecase

import opik
from opik import ExperimentItemReferences, opik_context, track

# Opik
opik_client = opik.Opik()

# Initialize the dataset objects
dataset = opik_client.get_or_create_dataset(DATASET_NAME)
experiment = opik_client.create_experiment(
    name=EXPERIMENT_NAME,
    dataset_name=dataset.name,
    experiment_config=EXPERIMENT_CONFIG,
)

#### MAIN LOGIC ####
items = dataset.get_items()

question_to_dataset_id = {}
for item in items:
    question_to_dataset_id[item.get("question")] = item.get("id")

@capture_nested_trace
async def execute_task(item):

    # Execute task WITH TRACE

    # GET LAST ROOT TRACE
    trace = opik_context.get_LAST_ROOT_TRACE_DATA()
    if trace is None:
        raise ValueError("Trace ID is not set")
    for result in range(10):
        experiment.insert(
            [
                ExperimentItemReferences(
                    dataset_item_id=list(question_to_dataset_id.values())[
                        result
                    ],
                    trace_id=trace.id,
                )
            ]
        )
    # update output and feedback scores
    # sets output for all...
    opik_context.update_current_trace(
        output={"Test": "x"},
        feedback_scores=[
            {
                "category_name": "Test",
                "name": "Test",
                "reason": "Test",
                "value": 1,
            }
        ],
    )

asyncio.run(execute_task(item))

Hope it makes sense

gustavhartz avatar Mar 25 '25 12:03 gustavhartz

@jverre Would additionally need to update a trace outside the opik_context.update_current_trace which would need to be done by through the api, but don't see the option of fixing the feedback_scores there

opik_api.traces.update_trace(
            id=trace_id,
            output=result,
        )

EDIT: Refactored my own code to make it work, but still a good idea to decouple

gustavhartz avatar Mar 25 '25 12:03 gustavhartz

@gustavhartz there is no decorator, but I suppose you can use opik_context.get_current_trace_data().id to get the current trace id.

alexkuzmik avatar Mar 25 '25 14:03 alexkuzmik

@jverre was this ever solved?

gustavhartz avatar Aug 08 '25 20:08 gustavhartz