opik [Bug]: Assert error in `after_agent_callback` for trace

What component(s) are affected?

[x] Python SDK
[ ] Opik UI
[ ] Opik Server
[ ] Documentation

Opik version

Opik version: 1.7.14
google-adk: 0.4.0

Describe the problem

Firstly, thanks for being the first ADK tracing integration so early.

The error I have is when using a custom Agent before_agent_callback function that handles 2 things:

Opik tracing via opik_tracer.before_agent_callback, and
Setting custom state on the callback_context (initialising default user data)

the opik_tracer.after_agent_callback method has an assertion error in assert context_storage.get_trace_data() is self.trace_data

Digging further, it appears as though context_storage.get_trace_data() is None, while self.trace_data is populated.

Error line: https://github.com/comet-ml/opik/blob/996d59db86985e4ee497c68860cc41b57acec057/sdks/python/src/opik/integrations/adk/opik_tracer.py#L74

Reproduction steps and code snippets

Set a custom before_agent_callback that applies the opik callback and also sets custom state as per:

from google.adk.agents.callback_context import CallbackContext
from opik.integrations.adk import OpikTracer

opik_tracer = OpikTracer()

def custom_before_agent_callback(callback_context: CallbackContext) -> None:
    opik_tracer.before_agent_callback(callback_context)
    callback_context.state["user:custom_state"] = { "custom_key1": 0, ... }  # Set user state variable

example_agent = Agent(
    model="gemini-2.0-flash",
    name="Redacted",
    description="Redacted",
    instruction="Redacted",
    sub_agents=[sub_agent1, sub_agent2],
    before_agent_callback=custom_before_agent_callback,  # custom callback to set tracing and custom state
    after_agent_callback=opik_tracer.after_agent_callback,
    before_model_callback=opik_tracer.before_model_callback,
    after_model_callback=opik_tracer.after_model_callback,
    before_tool_callback=opik_tracer.before_tool_callback,
    after_tool_callback=opik_tracer.after_tool_callback,
)

Error logs or stack trace

File "/Users/<redacted_path>/lib/python3.11/site-packages/google/adk/agents/base_agent.py", line 300, in __handle_after_agent_callback
after_agent_callback_content = self.after_agent_callback(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/<redacted_path>/lib/python3.11/site-packages/opik/integrations/adk/opik_tracer.py", line 74, in after_agent_callback
assert context_storage.get_trace_data() is self.trace_data
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Healthcheck results

*** HEALTHCHECK STARTED ***                                                                                                                                             
Python version: 3.11.11
Opik version: 1.7.14

*** CONFIGURATION FILE ***                                                                                                                                              
Config file path: /Users/<user>/.opik.config
Config file exists: yes

*** CURRENT CONFIGURATION ***                                                                                                                                           
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Setting                          ┃ Value                      ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ api_key                          │ None                       │
│ background_workers               │ 4                          │
│ check_tls_certificate            │ True                       │
│ console_logging_level            │ INFO                       │
│ default_flush_timeout            │ None                       │
│ enable_json_request_compression  │ True                       │
│ enable_litellm_models_monitoring │ True                       │
│ file_logging_level               │ None                       │
│ file_upload_background_workers   │ 16                         │
│ logging_file                     │ opik.log                   │
│ project_name                     │ Default Project            │
│ pytest_experiment_enabled        │ True                       │
│ sentry_enable                    │ True                       │
│ track_disable                    │ False                      │
│ url_override                     │ http://localhost:5173/api/ │
│ workspace                        │ default                    │
└──────────────────────────────────┴────────────────────────────┘

*** CONFIGURATION SCAN ***                                                                                                                                              
Configuration issues: not found

*** BACKEND WORKSPACE AVAILABILITY ***                                                                                                                                  
--> Checking backend workspace availability at: http://localhost:5173/api/
Backend workspace available: yes

*** HEALTHCHECK COMPLETED ***

May 05 '25 07:05 dylanhogg

@yaricom can you take a look at?

May 05 '25 08:05 jverre

Hi @dylanhogg ! I'm going to investigate this issue and come back to you with solution ASAP. Thank you for using OPIK and reporting an issue!

May 05 '25 10:05 yaricom

Thanks for looking into this @yaricom - doing a bit more digging and I note that for my Agent I am setting a state key in custom_before_agent_callback() with a "user:" prefix which by looking at the session State class in ADK has a special meaning, as you can follow by finding usages from here: https://github.com/google/adk-python/blob/8f94a0c7b360579b18b0cad770abb4364f8828c0/src/google/adk/sessions/state.py#L22

Thought is was worth mentioning in case the prefix type is playing a part with this bug.

May 05 '25 11:05 dylanhogg

~~As a work around, I've found that swapping the order of calls so that opik_tracer is last seems to solve this issue, however not sure why. Happy for this issue to be closed.~~

Apologies, I am still seeing the error when swapping the order of calls.

May 05 '25 23:05 dylanhogg

Hello @dylanhogg, I'm seeing in the code that you shared that you are using sub-agents. Did you also configured the Opik Tracer in those sub-agents? If yes, is it the same OpikTracer instance or a different one?

May 06 '25 09:05 Lothiraldan

Hi @Lothiraldan, yes both the subagents are configured with an OpikTracer and they are different instances (the agent definitions are in seperate Python files).

The example_agent above is basically the same pattern used for the root agent and the subagents.

May 06 '25 11:05 dylanhogg

Hi team, I've also reproduced this issue on my local, is there any workaround or possible to fix it soon ?

May 07 '25 13:05 Colstuwjx

Hi @dylanhogg @Colstuwjx. We've been able to reproduce the problem when using sub_agents. It turns out that some of our current design assumptions don’t hold up when running multiple agents with the ADK integration. We're actively working on a fix. Unfortunately, there isn’t a workaround at the moment, but we’ll keep you updated as we make progress.

May 07 '25 14:05 Lothiraldan

There is actually one workaround, you can add the OpikTracer to only one of the agent. But you won't be able to create other instances of the OpikTracer and you will be missing visibility into the other agents in your system.

May 07 '25 14:05 Lothiraldan

Thanks @Lothiraldan. I believe that not tracing the root agent (with before|after_agent_callback) and only tracing the sub-agents may work, I'll test that today. Not sure how it will impact the data recorded however.

Can you give a rough indication of when this issue could be fixed? (days/weeks/months?)

Thanks very much for jumping on this, much appreciated!

May 07 '25 23:05 dylanhogg

Thanks, Dylan for raising this issue. As part of the prototyping team at Google, I am interacting with a lot of our customers that are using ADK and potentially using Opik for evals, monitorin. I think this is something that would be good to get resolved for many of them. @Lothiraldan

May 08 '25 04:05 tanyagoogle

I'm having similar issues with opik 1.7.18. With subagents and with async code. My workaround so far has been to use the decorator @opik.track(project_name=os.environ["OPIK_PROJECT_NAME"]) on the entry function, and use also a generated thread_id, so that the threads appear in the cloud dashboard, by passing opik_context.update_current_trace(thread_id=thread_id) internally. Cost is missing unfortunately.

May 09 '25 21:05 mbertani

Should be fixed here

May 13 '25 17:05 alexkuzmik

Thank you @alexkuzmik ! Do you know if the count of input and output tokens, and cost are also fixed with this PR?

May 14 '25 17:05 mbertani

@mbertani tokens and cost should be available for Google gemini models, but we haven't tested it for litellm yet. The PR is released btw. We are prioritizing the improvements of the ADK integration, don't hesitate to open new issues if there is something missing or you detect any problems!

May 14 '25 17:05 alexkuzmik

I tried opik=1.7.20 and google-adk=0.5.0 @alexkuzmik , but I still get errors. The difference from your example is probably that I'm using subagents with different providers, I'm mixing Gemini, Anthropic and OpenAI. The input/output tokens and cost are not calculated either.

Code:

# Cell: imports
import os
import uuid
import random,string
import dotenv
import logging

# Agent Development Kit
from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm
from google.adk.tools import agent_tool, google_search
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.adk.events import Event
from google.genai import types as genai_types

# Observability
import opik
from opik.integrations.adk import OpikTracer
from opik import opik_context

# --- Configure LLM Observability ---
opik_tracer = OpikTracer()

# --- configure logging ---
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# --- Constants ---
APP_NAME = "test_app"
dotenv.load_dotenv(dotenv.find_dotenv(usecwd=True, filename=".env"))
GOOGLE_API_KEY = os.environ["GOOGLE_API_KEY"] 
USER_ID = "test_user"
TEMPERATURE_WEBSEARCH=0.3
TEMPERATURE_SUMMARIZER=0.1
TEMPERATURE_RESEARCH=0.5
TEMPERATURE_WRITER=0.7

MODEL="gemini-2.5-flash-preview-04-17"

def generate_random_string(length: int = 8):
    return "".join(random.choices(string.ascii_letters + string.digits, k=length))

@opik.track(project_name=os.environ["OPIK_PROJECT_NAME"])
def call_agent(query: str, runner, user_id, session_id, thread_id):
    """Sends a query to the agent and prints the final response."""

    opik_context.update_current_trace(thread_id=thread_id)

    logger.info(f"\n>>> User Query: {query}")

    # Prepare the user's message in ADK format
    content = genai_types.Content(role="user", parts=[genai_types.Part(text=query)])

    final_response_text = "Agent did not produce a final response."  # Default

    # Key Concept: run_async executes the agent logic and yields Events.
    # We iterate through events to find the final answer.
    for event in runner.run(
        user_id=user_id, session_id=session_id, new_message=content
    ):
        # You can uncomment the line below to see *all* events during execution
        logger.info(
            f"  [Event] Author: {event.author}, Type: {type(event).__name__}, Final: {event.is_final_response()}, Content: {event.content}"
        )

        # Key Concept: is_final_response() marks the concluding message for the turn.
        if event.is_final_response():
            if event.content and event.content.parts:
                # Assuming text response in the first part
                final_response_text = event.content.parts[0].text
                break # Final response reached, no further processing needed
            elif (
                event.actions and event.actions.escalate
            ):  # Handle potential errors/escalations
                final_response_text = f"Agent escalated: {event.error_message or 'No specific message.'}"
                break # We break also on escalation
        # Add more checks here if needed (e.g., specific error codes)
    return final_response_text


# Low-level tool-like agents
web_searcher = LlmAgent(
    name="WebSearch", 
    model=MODEL,
    description="Performs web searches for facts. Uses Google Search.",
    tools=[google_search],
    before_agent_callback=opik_tracer.before_agent_callback,
    after_agent_callback=opik_tracer.after_agent_callback,
    before_model_callback=opik_tracer.before_model_callback,
    after_model_callback=opik_tracer.after_model_callback,
    before_tool_callback=opik_tracer.before_tool_callback,
    after_tool_callback=opik_tracer.after_tool_callback,    
    generate_content_config=genai_types.GenerateContentConfig( 
        temperature=TEMPERATURE_WEBSEARCH,
    ), 
)
summarizer = LlmAgent(
    name="Summarizer", 
    model=MODEL,
    description="Summarizes text.",
    before_agent_callback=opik_tracer.before_agent_callback,
    after_agent_callback=opik_tracer.after_agent_callback,
    before_model_callback=opik_tracer.before_model_callback,
    after_model_callback=opik_tracer.after_model_callback,
    before_tool_callback=opik_tracer.before_tool_callback,
    after_tool_callback=opik_tracer.after_tool_callback,    
    generate_content_config=genai_types.GenerateContentConfig( 
        temperature=TEMPERATURE_SUMMARIZER,
    ), 
)

# Mid-level agent combining tools
research_assistant = LlmAgent(
    name="ResearchAssistant",
    #model=MODEL,
    model=LiteLlm(model="openai/gpt-4o"),
    description="Finds and summarizes information on a topic.",
    tools=[agent_tool.AgentTool(agent=web_searcher), agent_tool.AgentTool(agent=summarizer)],
    before_agent_callback=opik_tracer.before_agent_callback,
    after_agent_callback=opik_tracer.after_agent_callback,
    before_model_callback=opik_tracer.before_model_callback,
    after_model_callback=opik_tracer.after_model_callback,
    before_tool_callback=opik_tracer.before_tool_callback,
    after_tool_callback=opik_tracer.after_tool_callback,    
    generate_content_config=genai_types.GenerateContentConfig( 
        temperature=TEMPERATURE_RESEARCH,
    ), 
)

# High-level agent delegating research
report_writer = LlmAgent(
    name="ReportWriter",
    model=LiteLlm(model="anthropic/claude-3-haiku-20240307"),
    #model=LiteLlm(model="anthropic/claude-3-haiku-20240307"),
    instruction="Write a report on topic X. Use the ResearchAssistant to gather information. If you need to search, use the WebSearch agent.",
    tools=[agent_tool.AgentTool(agent=research_assistant)],
    before_agent_callback=opik_tracer.before_agent_callback,
    after_agent_callback=opik_tracer.after_agent_callback,
    before_model_callback=opik_tracer.before_model_callback,
    after_model_callback=opik_tracer.after_model_callback,
    before_tool_callback=opik_tracer.before_tool_callback,
    after_tool_callback=opik_tracer.after_tool_callback,    
    generate_content_config=genai_types.GenerateContentConfig( 
        temperature=TEMPERATURE_WRITER,
    ), )

SESSION_ID = generate_random_string()
THREAD_ID = str(uuid.uuid4())
session_service = InMemorySessionService()
opik_client = opik.Opik()
session = session_service.create_session(
    app_name=APP_NAME,
    user_id=USER_ID,
    session_id=SESSION_ID,
)

runner = Runner(
    agent=report_writer, # Pass the custom orchestrator agent
    app_name=APP_NAME,
    session_service=session_service
)

Use call_agent with your query.

Error (a bit anonymized)

--- Logging error ---
Traceback (most recent call last):
  File "....local/share/uv/python/cpython-3.11.10-macos-x86_64-none/lib/python3.11/logging/__init__.py", line 1113, in emit
    stream.write(msg + self.terminator)
  File ".../.venv/lib/python3.11/site-packages/marimo/_messaging/types.py", line 66, in write
    return self._write_with_mimetype(__s, mimetype="text/plain")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/marimo/_messaging/streams.py", line 308, in _write_with_mimetype
    assert self._stream.cell_id is not None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Call stack:
  File "....local/share/uv/python/cpython-3.11.10-macos-x86_64-none/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
  File "....local/share/uv/python/cpython-3.11.10-macos-x86_64-none/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File ".../.venv/lib/python3.11/site-packages/opik/message_processing/queue_consumer.py", line 26, in run
    self._loop()
  File ".../.venv/lib/python3.11/site-packages/opik/message_processing/queue_consumer.py", line 39, in _loop
    self._message_processor.process(message)
  File ".../.venv/lib/python3.11/site-packages/opik/message_processing/message_processors.py", line 57, in process
    handler(message)
  File ".../.venv/lib/python3.11/site-packages/opik/message_processing/message_processors.py", line 183, in _process_create_span_batch_message
    self._rest_client.spans.create_spans(spans=batch)
  File ".../.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 338, in wrapped_f
    return copy(f, *args, **kw)
  File ".../.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 480, in __call__
    result = fn(*args, **kwargs)
  File ".../.venv/lib/python3.11/site-packages/opik/rest_api/spans/client.py", line 366, in create_spans
    _response = self._raw_client.create_spans(spans=spans, request_options=request_options)
  File ".../.venv/lib/python3.11/site-packages/opik/rest_api/spans/raw_client.py", line 402, in create_spans
    _response = self._client_wrapper.httpx_client.request(
  File ".../.venv/lib/python3.11/site-packages/opik/rest_api/core/http_client.py", line 196, in request
    response = self.httpx_client.request(
  File ".../.venv/lib/python3.11/site-packages/httpx/_client.py", line 825, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File ".../.venv/lib/python3.11/site-packages/httpx/_client.py", line 914, in send
    response = self._send_handling_auth(
  File ".../.venv/lib/python3.11/site-packages/httpx/_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
  File ".../.venv/lib/python3.11/site-packages/httpx/_client.py", line 979, in _send_handling_redirects
    response = self._send_single_request(request)
  File ".../.venv/lib/python3.11/site-packages/httpx/_client.py", line 1025, in _send_single_request
    logger.info(
Message: 'HTTP Request: %s %s "%s %d %s"'
Arguments: ('POST', URL('https://www.comet.com/opik/api/v1/private/spans/batch'), 'HTTP/1.1', 204, 'No Content')
--- Logging error ---
Traceback (most recent call last):
  File "....local/share/uv/python/cpython-3.11.10-macos-x86_64-none/lib/python3.11/logging/__init__.py", line 1113, in emit
    stream.write(msg + self.terminator)
  File ".../.venv/lib/python3.11/site-packages/marimo/_messaging/types.py", line 66, in write
    return self._write_with_mimetype(__s, mimetype="text/plain")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/marimo/_messaging/streams.py", line 308, in _write_with_mimetype
    assert self._stream.cell_id is not None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Call stack:
  File "....local/share/uv/python/cpython-3.11.10-macos-x86_64-none/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
  File "....local/share/uv/python/cpython-3.11.10-macos-x86_64-none/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File ".../.venv/lib/python3.11/site-packages/opik/message_processing/queue_consumer.py", line 26, in run
    self._loop()
  File ".../.venv/lib/python3.11/site-packages/opik/message_processing/queue_consumer.py", line 39, in _loop
    self._message_processor.process(message)
  File ".../.venv/lib/python3.11/site-packages/opik/message_processing/message_processors.py", line 57, in process
    handler(message)
  File ".../.venv/lib/python3.11/site-packages/opik/message_processing/message_processors.py", line 206, in _process_create_trace_batch_message
    self._rest_client.traces.create_traces(traces=batch)
  File ".../.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 338, in wrapped_f
    return copy(f, *args, **kw)
  File ".../.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 480, in __call__
    result = fn(*args, **kwargs)
  File ".../.venv/lib/python3.11/site-packages/opik/rest_api/traces/client.py", line 331, in create_traces
    _response = self._raw_client.create_traces(traces=traces, request_options=request_options)
  File ".../.venv/lib/python3.11/site-packages/opik/rest_api/traces/raw_client.py", line 355, in create_traces
    _response = self._client_wrapper.httpx_client.request(
  File ".../.venv/lib/python3.11/site-packages/opik/rest_api/core/http_client.py", line 196, in request
    response = self.httpx_client.request(
  File ".../.venv/lib/python3.11/site-packages/httpx/_client.py", line 825, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File ".../.venv/lib/python3.11/site-packages/httpx/_client.py", line 914, in send
    response = self._send_handling_auth(
  File ".../.venv/lib/python3.11/site-packages/httpx/_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
  File ".../.venv/lib/python3.11/site-packages/httpx/_client.py", line 979, in _send_handling_redirects
    response = self._send_single_request(request)
  File ".../.venv/lib/python3.11/site-packages/httpx/_client.py", line 1025, in _send_single_request
    logger.info(
Message: 'HTTP Request: %s %s "%s %d %s"'
Arguments: ('POST', URL('https://www.comet.com/opik/api/v1/private/traces/batch'), 'HTTP/1.1', 204, 'No Content')

opik
opik copied to clipboard

[Bug]: Assert error in `after_agent_callback` for trace_data when using Google ADK

What component(s) are affected?

Opik version

Describe the problem

Reproduction steps and code snippets

Error logs or stack trace

Healthcheck results

opik opik copied to clipboard

[Bug]: Assert error in `after_agent_callback` for trace_data when using Google ADK

What component(s) are affected?

Opik version

Describe the problem

Reproduction steps and code snippets

Error logs or stack trace

Healthcheck results

opik
opik copied to clipboard