opencode
opencode copied to clipboard
add otel for perf debugging
This branch adds OpenTelemetry (OTEL) instrumentation for performance monitoring and debugging across both the Node.js backend and Go TUI components.
Key changes:
- Added telemetry infrastructure: New telemetry modules in both packages with OTEL tracing capabilities
- Performance instrumentation: Spans and measurements added to critical paths like message processing, API calls, and UI operations
- Refactored timing logic: Replaced custom timing utilities with standardized OTEL spans
- Cross-language tracing: Coordinated telemetry between Node.js server and Go TUI client
The goal is to provide better observability into performance bottlenecks and user interaction patterns within the opencode application.
The way I'm using it is with Grafana LGTM stack (with my script) but the easiest way is probably otel-desktop-viewer. I think it only supports traces and not logs or metrics which I haven't implemented yet anyway.
An example run would be to do
$ otel-desktop-viewer &
...
$ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 opencode
[tui intensifies]
the going to localhost:8000 and see the traces in action :)
new perf issues reported #805 #811
Lots of conflicts because the proper golang way would probably be to not have a closure but instead defer a close span method. Will do
wrt #805 we can have a follow up to have metrics reported if otel is set up for cpu usage in each sub-process we invoke (like LSP)
@adamdotdevin okay i think it's ready for a review. here's how the distributed traces look like:
you can see that "opencode-server" calls "opencode-tui" and then we see that the tui request has a child of "opencode-server".
some drawbacks but we can improve over time:
- Every use of app.TelemetryContext is probably bad, because we use the "root span" instead of where it happened. It means we need to propagate context.Context more instead of using context.Background() everywhere (which I replaced with app.TelemetryContext) so we get the actual propagation of component->action->trace. Might be hard but I think that we can evolve this over time
- SSE is still not traced well
- no implementation for metrics, which might be valuable too.