opencode icon indicating copy to clipboard operation
opencode copied to clipboard

add otel for perf debugging

Open Schniz opened this issue 4 months ago • 4 comments

This branch adds OpenTelemetry (OTEL) instrumentation for performance monitoring and debugging across both the Node.js backend and Go TUI components.

Key changes:

  • Added telemetry infrastructure: New telemetry modules in both packages with OTEL tracing capabilities
  • Performance instrumentation: Spans and measurements added to critical paths like message processing, API calls, and UI operations
  • Refactored timing logic: Replaced custom timing utilities with standardized OTEL spans
  • Cross-language tracing: Coordinated telemetry between Node.js server and Go TUI client

The goal is to provide better observability into performance bottlenecks and user interaction patterns within the opencode application.


The way I'm using it is with Grafana LGTM stack (with my script) but the easiest way is probably otel-desktop-viewer. I think it only supports traces and not logs or metrics which I haven't implemented yet anyway.

An example run would be to do

$ otel-desktop-viewer &
...
$ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 opencode
[tui intensifies]

the going to localhost:8000 and see the traces in action :)

Schniz avatar Jul 09 '25 13:07 Schniz

new perf issues reported #805 #811

adamdotdevin avatar Jul 10 '25 01:07 adamdotdevin

Lots of conflicts because the proper golang way would probably be to not have a closure but instead defer a close span method. Will do

Schniz avatar Jul 10 '25 16:07 Schniz

wrt #805 we can have a follow up to have metrics reported if otel is set up for cpu usage in each sub-process we invoke (like LSP)

Schniz avatar Jul 11 '25 06:07 Schniz

@adamdotdevin okay i think it's ready for a review. here's how the distributed traces look like:

image

you can see that "opencode-server" calls "opencode-tui" and then we see that the tui request has a child of "opencode-server".

some drawbacks but we can improve over time:

  • Every use of app.TelemetryContext is probably bad, because we use the "root span" instead of where it happened. It means we need to propagate context.Context more instead of using context.Background() everywhere (which I replaced with app.TelemetryContext) so we get the actual propagation of component->action->trace. Might be hard but I think that we can evolve this over time
  • SSE is still not traced well
  • no implementation for metrics, which might be valuable too.

Schniz avatar Jul 11 '25 07:07 Schniz