remote-apis icon indicating copy to clipboard operation
remote-apis copied to clipboard

Standardize Build Event Protocol

Open sluongng opened this issue 1 year ago • 7 comments

Today, among different matured build tool solutions there exists several build event protocols that enable build telemetry use cases:

On top of these, many build tools and CI systems in the wild have started adopting a more generic telemetry system (Open Telemetry, Prometheus) for their CI/CD telemetry needs:

  • https://github.com/mvisonneau/gitlab-ci-pipelines-exporter
  • https://github.com/craigatk/opentelemetry-gradle-plugin
  • https://github.com/inception-health/otel-export-trace-action
  • https://github.com/zoidyzoidzoid/gitlab-honeycomb-buildevents-webhooks-sink
  • https://github.com/craigatk/opentelemetry-gradle-plugin

So I want to start a discussion about a standardized Build Event Protocol so that different client and server implementations can agree on a common specification moving forward, and reduce overall fragmentation.

Please comment below if you are interested in adopting such a spec.

sluongng avatar Nov 01 '24 08:11 sluongng

Just to clarify: are we talking about build_event_stream.proto a.k.a. Build Event Protocol, or publish_build_event.proto a.k.a. Build Event Service?

Standardizing the latter (BES) might make sense. The former (BEP), I'm not convinced that's a good idea. The reason being that it exposes information in a schema that corresponds to Bazel's data model. For example, is it realistic to assume that Pants, Buck2, etc. etc. etc. all have the equivalent of a "ConvenienceSymlinksIdentified" event? I don't think so.

EdSchouten avatar Nov 01 '24 08:11 EdSchouten

Agree. I don't think we want to make Bazel-specific events a standardized spec.

I think a good starting point would be a new event protocol that meets all the common needs of existing tools:

  1. Creating an invocation with an ID
  2. Command line, workspace information
  3. Timing data
  4. ???

And leave an Any field for different tools to implement domain-specific events. Overtime, we can identify common needs between tools (i.e. more than 2 tools interested in the same thing) to add more event types to the spec.

sluongng avatar Nov 01 '24 09:11 sluongng

cc: @philwo @aherrmann @bergsieker who might be interested in this topic.

sluongng avatar Nov 01 '24 09:11 sluongng

My concern is that if we attempt to standardize anything that is in excess of the Build Event Service, it would severely suffer from an inner-platform effect.

EdSchouten avatar Nov 01 '24 09:11 EdSchouten

You might notice that even within Google we have (at least) two different interfaces for this. When we looked at standardizing them years ago, we found that BEP/BES didn't map well onto the Chromium build lifecycle. I don't recall the details, but certainly at least part of it was due to hierarchical builds, where one build initiates another, and you want to be able both to track them separately and to provide a rollup view. Both Bazel and Chrome had too much entrenched usage to make changing them realistic.

My gut feeling here is that BEP doesn't generalize well to other tools. BES might generalize but I'm not sure. However, Bazel is unlikely to move to a new protocol due to the significant infrastructure that we've built internally around BEP.

I'd suggest exploring what this looks like when built on top of an existing framework like Open Telemetry. It's possible there could be enough momentum from non-Bazel tools to get that off the ground, and leveraging existing open standards is good when possible.

bergsieker avatar Nov 04 '24 15:11 bergsieker

Added ReClient's events to the issue's description.

sluongng avatar Nov 12 '24 13:11 sluongng

I've just opened up a proposal to add BES (or something equivalent) to Buck2: https://github.com/facebook/buck2/pull/806

TheGrizzlyDev avatar Nov 12 '24 16:11 TheGrizzlyDev