pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[PIP-264] Parent issue for implementation

Open asafm opened this issue 2 years ago β€’ 4 comments
trafficstars

Overview

This issue will the parent issue tracking the implementation of PIP-264 (PR, issue that was converted to PR): Enhanced OTel-based metric system

Execution Plan

There are two parallel tracks of implementation:

  • OpenTelemetry - implementing all changes required to use OpenTelemetry in a low latency system such as Pulsar.
  • Pulsar - implementing the design per the parent PIP (264). See the complete design doc here.

Each part is composed of multiple epics. Some require Saga, divided into epics. The contents of each epic - its breakdown into different stories will be inside each epic, with each story having its issues. In Pulsar, many changes will require writing sub-PIPs. They will be linked appropriately inside each story issue.

πŸ”„ = In-progress.

OpenTelemetry track

This track is mainly done in https://github.com/open-telemetry/opentelemetry-java and https://github.com/open-telemetry/opentelemetry-specification.

  • [ ] πŸ”„ OTel should be performant enough
    • [x] OTel should be allocation-free on the collection path - asynchronous instruments
      • [x] Learn and draft a plan for what to change
      • [x] OTel Memory allocation test harness (JMH based)
      • [x] Run JMH before changes
      • [x] Modify the code and run JMH on it
      • [x] πŸ”„ Submit PR and get it approved
        • https://github.com/open-telemetry/opentelemetry-java/pull/5709
    • [x] OTel should be allocation free on the collection path - synchronous instruments
      • [x] Exponential Histogram
        • https://github.com/open-telemetry/opentelemetry-java/pull/5998
        • https://github.com/open-telemetry/opentelemetry-java/pull/6058
        • https://github.com/open-telemetry/opentelemetry-java/pull/6136
      • [x] Explicit Histogram
        • https://github.com/open-telemetry/opentelemetry-java/pull/6153
      • [x] Counter (Sum aggregator)
        • https://github.com/open-telemetry/opentelemetry-java/pull/6182
      • [x] πŸ”„ Last value aggregators
        • https://github.com/open-telemetry/opentelemetry-java/pull/6196
    • [ ] OTel memory mode should be supported by all readers and exporters
    • [ ] πŸ”„ OTel should be allocation free on OTLP HTTP Exporter
    • [ ] πŸ”„ OTel should support pushdown predicate
      • [x] OTel specifications should have a pushdown predicate
        • [x] Create issue
          • https://github.com/open-telemetry/opentelemetry-specification/issues/3324
        • [x] Garner favorable agreement from the community
        • [x] Write the proposal as PR
          • https://github.com/open-telemetry/opentelemetry-specification/pull/3566
        • [x] Work on PR to get it approved
      • [ ] πŸ”„ https://github.com/open-telemetry/opentelemetry-java/issues/6107
        • [x] Design and approve
        • [] πŸ”„ Write code with tests
        • [ ] Get PR approved
  • [ ] OTel should be bug-free for the Pulsar use case
    • [ ] https://github.com/open-telemetry/opentelemetry-java/issues/5581
    • [ ] https://github.com/open-telemetry/opentelemetry-java/issues/4901
  • [x] OTel should allow copying a resource attribute into each outgoing UTS in the Prometheus exporter
    • [x] Add proposal to OTel Specifications and approve it.
      • https://github.com/open-telemetry/opentelemetry-specification/pull/3761
    • [x] Implement specifications in OTel Java SDK
      • https://github.com/open-telemetry/opentelemetry-java/pull/6179
    • [x] πŸ”„ Improve AutoConfiguredOpenTelemetrySdkBuilder to allow customizing Prometheus exporter before it's being built to allow Pulsar to specify the pulsar.cluster attribute to copy
    • [x] Allow converting ProemetheusHttpServer to its builder so we can set the resource attributes predicate in the auto configured sdk builder.
      • https://github.com/open-telemetry/opentelemetry-java/pull/6333#event-12343146400
  • [ ] OTel should support adding custom authentication to Prometheus exporter
    • [ ] https://github.com/open-telemetry/opentelemetry-java/issues/6013

Pulsar track

  • [ ] πŸ”„ OTel scaffolding
    • [ ] πŸ”„ OTel should support copying Resource attributes in Prometheus Exporter
      • [x] Create a proposal and approve it: https://github.com/open-telemetry/opentelemetry-specification/pull/3761
      • [ ] Implement it in OTel Java SDK: https://github.com/open-telemetry/opentelemetry-java/issues/6108
    • [ ] πŸ”„ OTel layer with matching configuration, which includes Prometheus export and OLTP GRPC export
      • [x] Design PIP
      • [x] Publish and approve PIP
      • [ ] πŸ”„ Implement it - @dragosvictor
    • [x] Decide on naming
      • Decide on the semantic conventions and rules we will apply for the naming. We don’t need a PIP for this, as we’ll introduce the naming changes as part of a PIP
  • [ ] All Prometheus metrics are in OTel, except BK Metrics API, Plugin metrics, Pulsar Function Metrics
    • [ ] Messaging metrics in OTel
  • [ ] Update All Grafana dashboards.
  • [ ] Run performance test manually, verifying latency is not impacted up to 10k topics
  • [ ] Adding BK Metrics API implementation for OTel
  • [ ] Add support for Pulsar Functions.
  • [ ] Plugins support for OTel
  • [ ] Rate should be fully managed.
  • [ ] Metrics documentation rule (Forcing metrics definition to contain description and units)
  • [ ] Support metrics in OTel for 100k topics per broker
    • [ ] Introduce Topic Metrics Groups
    • [ ] Introduce Filtering
    • [ ] Performance test to make sure it works just as well as the previous solution
  • [ ] Support Pulsar authentication on OTel Prometheus exporter
  • [ ] Deprecate the Prometheus system

asafm avatar Sep 04 '23 12:09 asafm

In OpenTelemetry track, the support for object pooling to reduce memory allocation by almost 98% was implemented and merged but only for asynchronous instruments. I'm starting to work on doing that for synchronous instruments. In between those PRs I'm using the time to design the first sub PIP: creating the infra needed for any developer to use OTel in Pulsar, including Plugins.

asafm avatar Oct 03 '23 08:10 asafm

The issue had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Nov 03 '23 01:11 github-actions[bot]

@tisonkun Can this not be marked stale all the time?

asafm avatar Nov 07 '23 13:11 asafm

@asafm I ever talked about the stale bot, you can resume the discussion and I'm on the side to disable it.

  • https://lists.apache.org/thread/tv774jqohdpx8x0dymsskrd90xwwfvgp
  • https://lists.apache.org/thread/x2c7xod8y0wvh14nsb6bknf0dq3r9gls
  • https://lists.apache.org/thread/0woo9h53t109qsmtxsfqlcxzr16n5mn0

I may not have time to implement something alternatives now, so simply disable it makes sense to me.

tisonkun avatar Nov 07 '23 13:11 tisonkun