fabric icon indicating copy to clipboard operation
fabric copied to clipboard

Add OpenTelemetry interceptors to capture traces from gRPC communications

Open atoulme opened this issue 3 years ago • 12 comments

See #2954 - was closed for inactivity while #2997 was pending.

Type of change

  • New feature

Description

Adds OpenTelemetry tracing by adding interceptors on all gRPC communications.

Related issues

This is tied to https://github.com/hyperledger/fabric-rfcs/blob/main/text/0000-opentelemetry-tracing.md

atoulme avatar Jun 22 '22 21:06 atoulme

as a contributor, I just want to ask are we going to put packages in vendor folder? or just go.mod?

SamYuan1990 avatar Jun 23 '22 11:06 SamYuan1990

See https://hyperledger-fabric.readthedocs.io/en/latest/style-guides/go-style.html#adding-or-updating-go-packages

atoulme avatar Jun 23 '22 23:06 atoulme

Is anyone available to review?

atoulme avatar Jul 07 '22 17:07 atoulme

  1. No that is unrelated - tracing is distributed, each software piece will report on its own.
  2. not at this time, see the RFC https://github.com/hyperledger/fabric-rfcs/blob/main/text/0000-opentelemetry-tracing.md

atoulme avatar Jul 08 '22 12:07 atoulme

@SamYuan1990 please review and approve.

atoulme avatar Jul 11 '22 17:07 atoulme

@SamYuan1990 please review and approve.

image sorry @atoulme , I don't have the permission. As I said, so far, this PR LGTM.

SamYuan1990 avatar Jul 12 '22 13:07 SamYuan1990

Or @atoulme , you got my approve for this pr. even if I can't make it as limited by permission, but I suppose I can do something by leaving this message to support add opentelemetry into fabric.

SamYuan1990 avatar Jul 12 '22 15:07 SamYuan1990

@SamYuan1990 I appreciate you getting back to me and helping vet the PR. It's deeply appreciated. I believe I have sent an email to the fabric mailing list asking for a committer to take a look, and didn't hear back yet.

atoulme avatar Jul 12 '22 15:07 atoulme

@denyeart , @jkneubuh , @yacovm please help with @atoulme

SamYuan1990 avatar Jul 12 '22 15:07 SamYuan1990

Hi @atoulme

Adding a tracking / tracing / instrumentation / observability foundation to Fabric is an incredible contribution. Thank you for advancing this PR and RFC, it is both timely and incredibly relevant. There have been several discussions and efforts underway to characterize, measure, and improve the overall network throughput and transaction processing rates for Fabric, all of which will require a systematic, high-altitude view of system observability and custom metric aggregation. In addition to gRPC level trace monitoring, there have been some recent discussions around the need / opportunity to inject trace-level or function-level call tracking of core Fabric routines to profile, isolate, and resolve system bottlenecks.

To help advance this effort, would you consider presenting the material, pros/cons, benefits, and impacts in a context in a forum that is more suitable for an interactive discussion? This PR represents a "landscape shifting" moment for observability in Fabric networks, which in my opinion, warrants more than a single PR for the contribution. There are several active projects starting to look at Fabric (and overlay / Level 2) networks under a performance lens, to which this addition is directly relevant. In addition to the mechanics of landing this (or related PRs), the teams looking at throughput, i/o, and performance optimization need to be aware of the benefits of system-wide observability made available with an Open Telemetry integration.

Two good opportunities for socialization of the PR include:

  1. Fabric community contributor meetings. @denyeart : would you consider allocating time for Antoine and his team to present the OpenTelemetry integration hooks at our next scheduled (or next available) community call? The general team would really benefit from "seeing" some of the outcomes from this PR, rather than a direct inspection of the code and additional dependencies.

  2. We are in the process of incubating a Hyperledger Technical Working Group / Task Force to converge on architecture patterns to realize a Cloud Native Fabric runtime. Current topics for the Task Force include container orchestration (e.g. Kubernetes Operators), Mesh overlay networks (e.g. Istio/Linkerd), and x509 / TLS certificate management. Would you be interested in contributing to the WG / Task Force as a representative for observability and system-level monitoring?

jkneubuh avatar Jul 12 '22 21:07 jkneubuh

@jkneubuh thank you for the kind words. I did present at a contributor meeting over a year ago. The time of the meeting is far from ideal for me as it takes place at 6am. I am not keen on presenting there again. I do however have a webinar with the Hyperledger Foundation to discuss this effort with Fabric and Besu. I have also opened a RFC and there was extensive discussion of the merits of this improvement there. Overall, this process has been extremely slow. I don't understand some of the feedback on this PR - this PR itself has been opened for months and is quite straightforward, in line with what the RFC recommends. In your point 2., who is we? I feel this is not a good place to ask me for more time or contribution. Let's stay focused on the PR please. Please email me or contact me on Discord.

atoulme avatar Jul 13 '22 16:07 atoulme

https://github.com/cncf/tag-observability/blob/main/whitepaper.md#executive-summary attachment cncf document for reference, in this document shows how Observability works and help with performance enhancement?

SamYuan1990 avatar Jul 16 '22 03:07 SamYuan1990