buck2 icon indicating copy to clipboard operation
buck2 copied to clipboard

API design for BuckEvent data

Open sluongng opened this issue 1 year ago • 7 comments

Today, the build telemetry is done pretty much through Scribe (Meta's internal message queue system) into Scuba (Meta's telemetry platform).

This resulted in Buck2 implementing the EventSink trait for Scribe only:

/// A sink for events, easily plumbable to the guts of systems that intend to produce events consumeable by
/// higher-level clients. Sending an event is synchronous.
pub trait EventSink: Send + Sync {
    /// Sends an event into this sink, to be consumed elsewhere. Explicitly does not return a Result type; if sending
    /// an event does fail, implementations will handle the failure by panicking or performing some other graceful
    /// recovery; callers of EventSink are not expected to handle failures.
    fn send(&self, event: BuckEvent);

    /// Sends a control event into this sink, to be consumed elsewhere. Control events are not sent to gRPC clients.
    fn send_control(&self, control_event: ControlEvent);

    /// Collects stats on this sink (e.g. messages accepted, rejected).
    fn stats(&self) -> Option<EventSinkStats>;
}

In here:

  • BuckEvent is defined in buck_data/data.proto which is great for backward compatibility

  • ControlEvent is an enum

    /// An event that can be produced by Buck2 that is not intended to be presented to the user, but rather is used to
    /// communicate with other parts of Buck2.
    #[derive(Clone, From)]
    pub enum ControlEvent {
        /// A command result, produced upon completion of a command.
        CommandResult(Box<CommandResult>),
        /// A progress event from this command. Different commands have different types.
        PartialResult(PartialResult),
    }
    

    where CommandResult and PartialResult are both defined in buck2_cli_proto/daemon.proto. Not sure yet how backward compatible guarentee this is just yet.

  • EventSinkStats is a struct defined in buck2_events crate, not guarantee to be backward compatible.


It would be nice if we could define the telemetry data in a unified interface/data format similar to Bazel's BEP so that third parties could implement the server side for it. At the very least, having the data structures defined in a unified protobuf would help assuring that the future changes are backward compatible.

sluongng avatar May 02 '23 16:05 sluongng

Is this what you're looking for?

https://github.com/facebook/buck2/blob/main/app/buck2_data/data.proto

You mention the data from daemon.proto, I think this is generally not data we'd expect to be used for telemetry.

krallin avatar May 02 '23 16:05 krallin

I think buck2_data/data.proto is considered a backward compatible specification of the buck2 events. As @krallin noted, the daemon.proto stuff isn't, but that's probably okay for this use case. I'd expect the parts related to the event sink that interact with daemon.proto stuff or rust internal things would just be that we could add a grpc events sink implementation that would be able to send the events to some grpc service (i.e. something similar to https://github.com/googleapis/googleapis/blob/master/google/devtools/build/v1/publish_build_event.proto). I don't think that we'd try to match (or approximately match) the BEP directly, but an analysis of what that would look like could be interesting (i could maybe see someone prototyping some stuff with a simple proxy that converted a subset of buckevents into equivalent BEP events).

In addition, I think just providing a bit of documentation about the buck events (from data.proto) somewhat like that linked BEP page would be useful.

cjhopman avatar May 02 '23 18:05 cjhopman

What do you folks think if we just start with creating root//app/buck2_eventserver_proto with this

syntax = "proto3";

import "google/protobuf/duration.proto";
import "data.proto";

package buck.eventserver;

message BuckEventRequest {
  BuckEvent event = 1;
};

message BuckEventResponse {
// TBD
};

service EventServer {
  rpc send(stream BuckEventRequest) returns (stream BuckEventResponse);
}

Should be the minimal mapping to EventSink.send() trait in the current buck2_events crate. Once we have a client/server stub in place, the rest could be iterated from there.

sluongng avatar May 09 '23 10:05 sluongng

Are there any plans for Buck2 to support Build Event Protocol? This protocol is adopted by most remote execution services like EngFlow, BuildBuddy, BuildBarn, and so on.

burdiyan avatar Jan 14 '24 10:01 burdiyan

As we started seeing upticks in Buck2 interest from BuildBuddy side, I sent https://github.com/facebook/buck2/pull/685 to kickstart some discussion about this.

The initial design is intentionally minimal so that we could get a bare minimum POC working. I hope that we can add more to the API as the usage matures over time.

The RPC was heavily inspired by PublishBuildToolEventStream RPC on Bazel side.

sluongng avatar Jun 17 '24 14:06 sluongng

As I look more into the current usage of ThriftScribeSink in https://github.com/facebook/buck2/pull/686/files, a few questions/thoughts pop up:

  • It's strange that send_now and send_messages_now are overly used in several places instead of send. I guess this has something to do with Scribe's queue inside Meta could be slow?

  • Those 2 fn are also async which made it harder to refactor them into a separate Trait.

  • Most likely in the OSS grpc implementation, we will just stub these events to call send instead as I don't expect there would be a separate API for queue/non-queue for the initial design.

sluongng avatar Jun 18 '24 10:06 sluongng

(I'll respond to that last message in the PR)

JakobDegen avatar Jul 02 '24 03:07 JakobDegen