Add OpenTelemetry interceptors to capture traces from gRPC communications
See #2954 - was closed for inactivity while #2997 was pending.
Type of change
- New feature
Description
Adds OpenTelemetry tracing by adding interceptors on all gRPC communications.
Related issues
This is tied to https://github.com/hyperledger/fabric-rfcs/blob/main/text/0000-opentelemetry-tracing.md
as a contributor, I just want to ask are we going to put packages in vendor folder? or just go.mod?
See https://hyperledger-fabric.readthedocs.io/en/latest/style-guides/go-style.html#adding-or-updating-go-packages
Is anyone available to review?
- No that is unrelated - tracing is distributed, each software piece will report on its own.
- not at this time, see the RFC https://github.com/hyperledger/fabric-rfcs/blob/main/text/0000-opentelemetry-tracing.md
@SamYuan1990 please review and approve.
@SamYuan1990 please review and approve.
sorry @atoulme , I don't have the permission.
As I said, so far, this PR LGTM.
Or @atoulme , you got my approve for this pr. even if I can't make it as limited by permission, but I suppose I can do something by leaving this message to support add opentelemetry into fabric.
@SamYuan1990 I appreciate you getting back to me and helping vet the PR. It's deeply appreciated. I believe I have sent an email to the fabric mailing list asking for a committer to take a look, and didn't hear back yet.
@denyeart , @jkneubuh , @yacovm please help with @atoulme
Hi @atoulme
Adding a tracking / tracing / instrumentation / observability foundation to Fabric is an incredible contribution. Thank you for advancing this PR and RFC, it is both timely and incredibly relevant. There have been several discussions and efforts underway to characterize, measure, and improve the overall network throughput and transaction processing rates for Fabric, all of which will require a systematic, high-altitude view of system observability and custom metric aggregation. In addition to gRPC level trace monitoring, there have been some recent discussions around the need / opportunity to inject trace-level or function-level call tracking of core Fabric routines to profile, isolate, and resolve system bottlenecks.
To help advance this effort, would you consider presenting the material, pros/cons, benefits, and impacts in a context in a forum that is more suitable for an interactive discussion? This PR represents a "landscape shifting" moment for observability in Fabric networks, which in my opinion, warrants more than a single PR for the contribution. There are several active projects starting to look at Fabric (and overlay / Level 2) networks under a performance lens, to which this addition is directly relevant. In addition to the mechanics of landing this (or related PRs), the teams looking at throughput, i/o, and performance optimization need to be aware of the benefits of system-wide observability made available with an Open Telemetry integration.
Two good opportunities for socialization of the PR include:
-
Fabric community contributor meetings. @denyeart : would you consider allocating time for Antoine and his team to present the OpenTelemetry integration hooks at our next scheduled (or next available) community call? The general team would really benefit from "seeing" some of the outcomes from this PR, rather than a direct inspection of the code and additional dependencies.
-
We are in the process of incubating a Hyperledger Technical Working Group / Task Force to converge on architecture patterns to realize a Cloud Native Fabric runtime. Current topics for the Task Force include container orchestration (e.g. Kubernetes Operators), Mesh overlay networks (e.g. Istio/Linkerd), and x509 / TLS certificate management. Would you be interested in contributing to the WG / Task Force as a representative for observability and system-level monitoring?
@jkneubuh thank you for the kind words. I did present at a contributor meeting over a year ago. The time of the meeting is far from ideal for me as it takes place at 6am. I am not keen on presenting there again. I do however have a webinar with the Hyperledger Foundation to discuss this effort with Fabric and Besu. I have also opened a RFC and there was extensive discussion of the merits of this improvement there. Overall, this process has been extremely slow. I don't understand some of the feedback on this PR - this PR itself has been opened for months and is quite straightforward, in line with what the RFC recommends. In your point 2., who is we? I feel this is not a good place to ask me for more time or contribution. Let's stay focused on the PR please. Please email me or contact me on Discord.
https://github.com/cncf/tag-observability/blob/main/whitepaper.md#executive-summary attachment cncf document for reference, in this document shows how Observability works and help with performance enhancement?