community icon indicating copy to clipboard operation
community copied to clipboard

Project Tracking: Performance Benchmarking SIG

Open cartersocha opened this issue 1 year ago • 24 comments

Description

As the adoption of OpenTelemetry grows and larger enterprises continue to deepen their usage of project components there are persistent and ongoing end user questions about the OpenTelemetry performance impact. End user performance varies due to the quirks of their environment but without a project performance standard and historical data record no one really knows if the numbers they're seeing are abnormal or expected. Additionally, there is no comprehensive documentation available on tuning project components or the performance trade-offs available to users which results in a reliance on vendor support.

Project Maintainers need to be able to track the current state of their components while preventing any performance regressions when making new releases. Customers need to be able to get a general sense of potential OpenTelemetry performance impact and the certainty that OpenTelemetry takes performance and customer resources seriously. Performance tracking and quantification is a general need that should be addressed by a project wide effort and automated tooling that minimizes repo owner effort while providing valuable new data points for all project stakeholders.

Project Board

SIG Charter

charter

Deliverables

  • Evaluate the current performance benchmarking specification, propose an updated benchmarking standard that can apply across project components, and make the requisite specification updates. The benchmarking standard should provide relevant information for maintainers and end users.
  • Develop automated tooling that can be used across project repos to report current performance numbers and track changes as new features / PRs are merged.
  • Write performance tuning documentation for the project website that can help customers make actionable decisions when faced with performance trade-offs or debugging bad component performance.
  • Provide ongoing maintenance as needed on automated tooling and own the underlying assets

Initial implementation scope would be the core Collector components (main repo), JavaScript / Java / Python SDKs and their core components. No contrib or instrumentation.

Staffing / Help Wanted

Anyone with an opinion on performance standards and testing.

Language maintainers or approvers as they will be tasked with implementing the changes and following through on the process.

Required staffing

lead - tbd @jpkrohling domain expert @cartersocha contributor @mwear collector sig @codeboten collector sig implementation @ocelotl python sig @martinkuba javascript @tylerbenson java @sbaum1994 contributor

@jpkrohling - TC/GC sponsor @alolita - TC/GC sponsor

Need: more performance domain experts Need: maintainers or approvers from several language sigs to participate

Meeting Times

TBD

Timeline

Initial scope is for the Collector and 3 SDKs. Output should be by KubeCon NA November 6, 2023

Labels

tbd

Linked Issues and PRs

https://opentelemetry.io/docs/collector/benchmarks/ https://github.com/cncf/cluster/issues/245 https://github.com/cncf/cluster/issues/182 https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/performance-benchmark.md https://opentelemetry.io/docs/specs/otel/performance-benchmark/

cartersocha avatar Jul 27 '23 20:07 cartersocha

@puckpuck fyi

cartersocha avatar Jul 27 '23 20:07 cartersocha

Please delete boilerplate like this form the description to make it easier to read:

A description of what this project is planning to deliver, or is in the process of delivering. This includes all OTEPs and their associated prototypes.

In general, OTEPs are not accepted unless they come with working prototypes available to review in at least two languages. Please discuss these requirements with a TC member before submitting an OTEP.

There is more like that that seems to be copied from a template and should be deleted/replaced by more specifics.

tigrannajaryan avatar Jul 27 '23 21:07 tigrannajaryan

  • Evaluate the current performance benchmarking specification

Does this refer to this document?

tigrannajaryan avatar Jul 27 '23 21:07 tigrannajaryan

cc @gsoria and @harshita19244, as they worked on performance benchmarks for SDKs at different stages (OpenTracing and OpenTelemetry) and can share their experience in doing so.

jpkrohling avatar Jul 28 '23 01:07 jpkrohling

cc @sh0rez and @frzifus , as they are interested in benchmarking the collector against other solutions

jpkrohling avatar Jul 28 '23 01:07 jpkrohling

@cartersocha I'd be happy to be the second GC sponsor supporting this Performance Benchmarking SIG.

I recommend creating a Charter doc for this SIG to map out more details about the mission, goals, deliverables and logistics for this SIG. Let's also itemize what items are out of scope and non-goals since performance benchmarking is a subjective area for an open source project of OpenTelemetry's breadth and depth.

Please share link on this thread.

alolita avatar Aug 01 '23 06:08 alolita

Hi, I worked on the performance benchmarking project to compare the performance of the Opentracing and the Opentelemetry libraries as a part of my Outreachy internship. All tests were executed on bare metal machines. Please find the GitHub repo here: https://github.com/harshita19244/opentelemetry-java-benchmarks Do feel free to reach out to me in case of questions.

harshita19244 avatar Aug 01 '23 18:08 harshita19244

Over in PHP SIG, we've implemented (most of) the documented perf tests, but what I think we lack is a way to run them on consistent hardware, and a way to publish the results (or compare to a benchmark to track regressions/improvements).

brettmc avatar Aug 02 '23 12:08 brettmc

@brettmc already made an ask for bare metal machines that was approved. I’ll share the details once we get them https://github.com/cncf/cluster/issues/245

cartersocha avatar Aug 02 '23 16:08 cartersocha

Thx @cartersocha for starting this!

Anyone with an opinion on performance standards and testing.

I would be super interested in participating.

Recently @sh0rez started a project to compare the grafana-agent and the Prometheus-agent performance in collecting metrics. Since its quite flexible, it wasn't to hard to extend it to include the open telemetry collector. Maybe its beneficial for this project, happy to chat about it.

frzifus avatar Aug 02 '23 18:08 frzifus

Would love to see the data / results or hear about any testing done here @frzifus. Thanks for being willing to share your work 😎

cartersocha avatar Aug 02 '23 19:08 cartersocha

Added a charter to the proposal as @alolita suggested.

cartersocha avatar Aug 07 '23 20:08 cartersocha

:+1:

ocelotl avatar Aug 22 '23 17:08 ocelotl

Looking forward to seeing this go forward! cc @tobert

vielmetti avatar Aug 24 '23 19:08 vielmetti

hey @frzifus @sh0rez @harshita19244 @gsoria @brettmc we now have bare metal machines to run tests on. I wasn't sure how to add all of you on slack but we're in the CNCF slack otel-benchmarking channel.

https://cloud-native.slack.com/archives/C05PEPYQ5L3

cartersocha avatar Aug 24 '23 20:08 cartersocha

In java we've taken performance fairly seriously, and continue to make improvements as we receive feedback. For example, we received an issue about a use case in which millions of distinct metric series may need to be maintained in memory, and feedback that the SDK at the time would produce problematic memory churn. Since receiving, we worked to reduce metric memory allocation by 80%, and there is work in progress to reduce it by 99.9% (essentially zero memory allocations after the metric SDK reaches a steady state). We also have performance test suites for many sensitive areas and validate that changes to sensitive areas don't degrade performance.

All this is to say that I believe we have a decent performance story today.

However, where I think we could improve is in documentation for the performance to point curious users to. Our performance test suites require quite a bit of context to run and interpret the results. It would be great if we could extend the spec performance benchmark document to include high level descriptions of some use cases for each signal, and to provide tooling to be able to run and publish performance results to some central location.

If the above was available, we would have some nice material to point users to who are evaluating the project. We would still keep the nuanced performance tests around for sensitive areas, but it would be good to have something simpler / higher level.

In general, I think performance engineering is going to be very language / implementation dependent. I would caution against too expansive of a scope for a cross-language performance group. It would be great to provide some documentation of use cases to evaluate in suites, and tooling for running on bare metal / publishing results. But there are always going to be nuanced language specific concerns. I think we should raise those issues with the relevant SIGs, and let those maintainers / contributors work out solutions.

jack-berg avatar Sep 08 '23 21:09 jack-berg

I have similar position with @jack-berg.

Taking OpenTelemetry .NET as an example, performance has been taken care of seriously from the beginning:

  • Stress test https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/test/OpenTelemetry.Tests.Stress.Metrics
  • Benchmarks https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/test/Benchmarks
  • Zero heap allocation is enforced on hot paths https://github.com/open-telemetry/opentelemetry-dotnet/blob/b870ed9b0c965ec89bf6b5aedab87ff3cab8ea68/test/Benchmarks/Metrics/HistogramBenchmarks.cs#L3

Thinking about what could potentially benefit OpenTelemetry .NET, having some perf numbers published to an official document on opentelemetry.io across all programming languages might increase the discoverability.

reyang avatar Sep 14 '23 19:09 reyang

Thanks for the context all. @jack-berg could you share where the java tests are published and what compute they run on? @reyang could you share what compute you rely on in dotnet and consider migrating the test results to the otel website like the collector does?

cartersocha avatar Sep 18 '23 17:09 cartersocha

The tests are scattered throughout the repo in directories next to the source they evaluate. All the directories contain "jmh". I wrote a quick little script to find them all:

find . -type d | grep "^.*\/jmh$" | grep -v ".*\/build\/.*"

# Results
./context/src/jmh
./exporters/otlp/all/src/jmh
./exporters/otlp/common/src/jmh
./extensions/trace-propagators/src/jmh
./extensions/incubator/src/jmh
./sdk/metrics/src/jmh
./sdk/trace/src/jmh
./sdk/logs/src/jmh
./api/all/src/jmh

They run on each developers local machine, and only on request. The basic idea is that maintainers / approvers know which areas of the code are sensitive and have JMH test suites. When someone opens a PR which we suspect has performance implications, we ask them to run the performance suite before and after and compare the results (example). Its obviously imperfect, but has generally been fine.

It would be good if there was an easy way to run a subset of these on stable compute and publish the results to a central place. I think running / publishing all of them might be overwhelming.

jack-berg avatar Sep 18 '23 21:09 jack-berg

Makes sense. Thanks for sharing those details. Let me start a thread in the cncf slack to coordinate machine access

cartersocha avatar Sep 18 '23 21:09 cartersocha

A random find that I just stumbled across. K6 extension for generating OTEL signals created by an ING Bank engineer https://github.com/thmshmm/xk6-opentelemetry

I'm not sure what the guidelines on usage of 3rd party tooling are for the Performance Benchmarking SIG.

cwegener avatar Sep 20 '23 03:09 cwegener

Thanks for sharing @cwegener! The guidelines are to be defined so we’ll see but the general preference is for community tooling (which can also be donated). We’re a decentralized project and each language has its quirks so whatever guidelines that would be defined would be more of a baseline. If you think this approach would be generally beneficial we’d love to hear more. Feel free to cross post in the #otel-benchmarking channel

cartersocha avatar Sep 20 '23 19:09 cartersocha

If you think this approach would be generally beneficial we’d love to hear more.

I will test drive the k6 extension myself a little bit and report back in Slack.

cwegener avatar Sep 21 '23 08:09 cwegener

@cartersocha do you mind converting this issue to a PR? We are now placing proposals here: https://github.com/open-telemetry/community/tree/main/projects

tedsuo avatar Sep 27 '23 14:09 tedsuo