buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

hack: add otel collector, prometheus, and grafana to compose file

Open jsternberg opened this issue 7 months ago • 2 comments

This includes the otel collector and uses the otel collector to direct traces and metrics to the correct services. This is intended to act as more of what a pipeline might look like in a production environment for the traces and metrics.

Traces are forwarded from the otel collector directly to jaeger similar to how that was configured before.

Metrics are forwarded to the prometheus exporter which prometheus will use to scrape the metrics. A grafana instance has been included to use with the prometheus instance to visualize the data. The default username and password is moby/moby and the URL for the prometheus instance is http://prometheus:9090. This state along with any dashboards will be retained in your local development environment through a volume.

jsternberg avatar Jan 11 '24 19:01 jsternberg

I could not get this working without more guide or help in hack/compose of how this should be used.

The Jaeger data seems to work, although I didn't figure out how to connect tempo to the opentelemetry collector. I did not see any metrics data, via prometheus or otherwise.

I've moved the documention in hack/compose to a README in hack/composefiles and referenced the location in the original script. That README should describe how to use it including the metrics.

The metrics from buildkit are automatically sent, but I can't automatically configure buildx with this so you need to do it using some environment variables. Might be worth revisiting client metrics in the future and do something similar to what happens with traces, but I'd like to sort out the tracing API problems in the buildkit client first before doing that.

Not directly part of this PR but we should not disable unix socket access to buildkit. I'm not sure why the TCP 1234 port is needed atm as buildx/buildctl should both understand connecting via docker-container:// scheme.

Mostly because I didn't know this was a thing. I've updated the various instructions that use TCP 1234 and changed it so it uses the default unix socket.

jsternberg avatar Jan 29 '24 16:01 jsternberg

@crazy-max

Network issues when starting a build using this builder:

This is weird. I don't get this when doing this locally and I'm not sure what could be causing it. The compose file uses moby/buildkit:local which is the default name for make images. If you don't use --build with the compose invocation, it'll use whatever was already present. Maybe you have something local from another branch you're accidentally using? Try using hack/compose up -d --build.

But when doing a build no metrics?

Fixed this. I copied the configuration from the docs and it sets the cache level to high which seems to set the caching on the metrics explorer to some ridiculous metric that causes it to not requery the schema. I've changed it to none since this is a local prometheus/grafana instance for development and we can always nuke the volume and recreate it on our local machines if it starts to cause problems.

jsternberg avatar Feb 12 '24 23:02 jsternberg

Just going to close this for now since https://github.com/moby/buildkit/pull/4757 was merged and it does 90% of what I want. At the moment, I don't find the prometheus/grafana part of this useful and they're easy enough to add as a separate component so I don't really feel the need to improve this PR.

jsternberg avatar Mar 29 '24 17:03 jsternberg