iceberg-go icon indicating copy to clipboard operation
iceberg-go copied to clipboard

Metrics Reporting

Open DerGut opened this issue 6 months ago • 2 comments

Feature Request / Improvement

Iceberg's Metrics Reporting API

We've started a discussion about the Metrics Reporting API in https://github.com/apache/iceberg-rust/issues/1466. It's part of the catalog spec and concerns itself with monitoring Iceberg client's accesses to files in object storage, e.g. number of files considered, scanned and skipped during scan planning (and similar ones for commits). These types of metrics are otherwise not visible from the catalog and the metrics reporting API provides a standard interface to aggregate such metrics across clients.

Currently, only Iceberg Java ships with an implementation for metrics reporting. While providing a pluggable interface, it comes with default implementations LoggingMetricsReporter and RestMetricsReporter. The latter is used in combination with REST catalogs and sends recorded metrics over for server-side processing.

Existing Telemetry APIs

On the draft implementation, @sdd raised a good point that we now have other, often more idiomatic interfaces available https://github.com/apache/iceberg-rust/pull/1496#issuecomment-3064213003. In Rust for example, we've decided on using the facade metrics which users can back by any exporter they like, offering simple integrations with existing observability systems. In Go, opentelemetry offers similar functionality.

Using existing telemetry APIs, reporting code could look much simpler and backing integrations will be easier (no custom code needed).

Metric Names

Emitting metrics straight from the library will mean we also need to standardize on metric names or implementations could diverge, defeating the idea of a unified way of monitoring Iceberg clients.

I would like to propose a naming system similar to @sdd's PoC comprised of

iceberg.<operation>.<resource>.<count-type>

for example iceberg.scan.data_files.scanned, iceberg.scan.delete_manifests.skipped or iceberg.commit.delete_files.added. Existing metrics can be taken from ScanMetricsResult.java and CommitMetricsResult.java.

Catalog Spec

The Metrics Reporting API is part of the catalog spec which suggests that we should consider implementing it anyway. If we can prove with an experiment that (for example) an opentelemetry exporter can consume a spec-compliant reporter interface, we should be good. If we can't, we need to take this into consideration. With the spec's API, multiple metrics are bundled together into a single report. This doesn't seem natural for other metrics APIs and could become an implementation burden.


I want to use this issue to:

  1. start a general discussion about metrics reporting in Go because I find it tremendously useful when working with many clients, and would like to contribute such functionality

  2. extend the discussion about following the Java implementation vs. using more idiomatic approaches because I would like to see different implementations moving into a similar direction

    1. find agreement on metric names if we choose this path

See also https://github.com/apache/iceberg-python/issues/474#issuecomment-3067304005 for a similar discussion in Python.

DerGut avatar Jul 13 '25 20:07 DerGut

Thanks for taking this on. I think it's a great idea for us to ensure we can align all the implementations to utilize the same metrics names for the same things. Even better if you're willing to contribute the functionality. It's been on my list of things to do for some time and I just haven't gotten around to it.

It might be best to come up with a document or something listing out all the metric names and what they should represent so multiple people could potentially contribute the instrumentation in parallel.

zeroshade avatar Jul 14 '25 21:07 zeroshade

Totally agree with consolidating the discussion somewhere! It took me a while but here's a doc that I just sent out to the dev list. Happy to hear your thoughts on it 🙂

DerGut avatar Jul 18 '25 16:07 DerGut