opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

Upcoming changes to JMX metrics gatherer

Open breedx-splk opened this issue 1 year ago • 3 comments

Component(s)

receiver/jmx

Describe the issue you're reporting

Hi collector folks! Greetings from the Java instrumentation group.

We have a java contrib component commonly called the JMX Metrics Gatherer which allows users to choose from a set of preconfigured out-of-the-box frameworks and/or provide custom configurations for their own JMX MBeans. This component runs as a stand-alone jar file, connects to a JVM, and exports metrics. We believe it is bundled or otherwise launched from the jmxreceiver in this repository.

For some time, we (otel java sig) have had a confusing and limiting situation which involves two components with similar goals but notably different implementations and configuration: jmx auto-instrumentation via the agent, and the above-mentioned jmx-metrics gatherer.

We want to begin unifying these separate implementations into something more cohesive. Furthermore, the jmx-metrics gatherer is written in and configured with groovy, and there is little interest in maintaining that going forward. So, the approach we discussed today in the Java SIG meeting today is this -- we will create a new module in java-contrib (name is tbd) that will essentially be a new version of the jmx-metrics gatherer. This will be written in Java and will use the same yaml based configuration as the java agent instrumentation. Once that module is published and we think it is ready to be used widely, we will remove the existing jmx-metrics module and stop publishing updates to it.

So this issue is really to:

  1. let collector contrib maintainers/contributors know that this change is coming (timeline tbd)
  2. solicit feedback and provoke discussion about the approach described above
  3. learn more from collector folks about how widely used the jmx receiver is and any other existing challenges with it.

Thanks!

breedx-splk avatar Aug 23 '24 00:08 breedx-splk

Pinging code owners:

  • receiver/jmx: @rmfitzpatrick

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Aug 23 '24 00:08 github-actions[bot]

For example for the items 2) and 3), we already know the following:

The current JMX gatherer (contrib) implementation also includes some pre-defined sets of metrics which are selected through the otel.jmx.target.system (doc)

One of the goals is being able to re-implement the current set of metrics with the YAML configuration instead of the groovy definitions, so ideally it could be a drop-in replacement for people relying on it.

In practice though, end-users are also allowed to provide their own groovy metrics definitions with otel.jmx.groovy.script config option, for this use-case a migration from groovy to yaml will be necessary. This "breaking change" is the main reason that makes us believe creating a new component is better than changing the current implementation.

In other terms, for those pre-defined metrics, do we have any idea if using the otel.jmx.groovy.script is frequent or not ?

SylvainJuge avatar Aug 23 '24 08:08 SylvainJuge

From a cursory glance at the code, I don't think otel.jmx.groovy.script is configurable through the receiver. The jmx gatherer config is completely generated by the receiver in this function, and I don't see any reference to it.

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/2d5b62d99b9940c2dde01e0c50016be46177e970/receiver/jmxreceiver/receiver.go#L171

So if otel.jmx.target.system is still supported in some form, it seems like it might not necessarily be a breaking change?

BinaryFissionGames avatar Aug 26 '24 13:08 BinaryFissionGames

From a cursory glance at the code, I don't think otel.jmx.groovy.script is configurable through the receiver.

It's not ... it was actually specifically taken out due to security concerns (it basically allows the collector to run an arbitrary groovy script -> essentially arbitrary code execution)

This proposed code change would allow customers of the collector to change what's queried, an in general be a huge usability increase

hughesjj avatar Sep 09 '24 18:09 hughesjj

So if otel.jmx.target.system is still supported in some form, it seems like it might not necessarily be a breaking change?

Not directly related to the collector or the receiver, but we have to still assume that existing users are launching the jar file through some other mechanism, in which case it's still "breaking" to them. Just sayin'.

breedx-splk avatar Sep 10 '24 21:09 breedx-splk

Why implement the JMX scraper in Java, thereby requiring to bundle java within the otelcol image? Why not implement the scraper in Go directly (using something like gojmx), removing any Java dependency? We could remove one layer of complexity here.

choucavalier avatar Oct 23 '24 07:10 choucavalier

Why implement the JMX scraper in Java, thereby requiring to bundle java within the otelcol image? Why not implement the scraper in Go directly (using something like gojmx), removing any Java dependency? We could remove one layer of complexity here.

On a high level, the JMX Scraper has the following goals:

  • provide the ability to capture JMX metrics in an identical way as the JMX Insights feature of the Java instrumentation agent (which comes with YAML definition for metrics to capture)
  • provide a drop-in replacement of JMX Gatherer by providing an identical/very close set of metrics that allow to reuse it with the jmxreceiver collector extension
  • allow to deprecate the JMX Gatherer that relies on groovy scripts for metrics definitions which is inherently "unsafe by nature" and does not allow to align metrics with the JMX Insights definitíons (from the otel collector side, there is no way to use an arbitrary groovy script for custom metrics due to security implications).

Using something from gojmx can simplify the otel collector extension, but it's very likely that not all features of the JMX protocol are supported (clients & server certificates, authentication, using alternative JMXMP protocol instead of RMI, ...). From experience dealing with those in Java is already tricky, so doing that with a completely independent protocol implementation is expected to be an order of magnitude harder.

I would be happy to be proven wrong on this last aspect, and maybe it could also be interesting to create another otel collector extension that relies on gojmx, even if not all the advanced features/complexities of the JMX protocol are not supported it could provide something useful and remove the dependency on a JVM that might work for some users with simple setups. As usual, feel free to contribute another implementation, especially if you are already familiar with the otel collector and gojmx.

SylvainJuge avatar Oct 23 '24 08:10 SylvainJuge

Thanks @SylvainJuge for your very detailed answer to my naive question. I'm not familiar with the whole complexity you just described. I guess it makes sense then.

choucavalier avatar Oct 23 '24 09:10 choucavalier

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] avatar Dec 23 '24 03:12 github-actions[bot]

The new JMX scrapper artifact is now published, so the next step here is making it possible to use it with the jmxreceiver to enable getting feedback on it.

This allows to provide a drop-in replacement and still capture the same metrics as the ones currently defined with the JMX gatherer.

In the future, those metrics definitions will be defined in a single location (likely in instrumentation-java repository), which will allow to capture exactly the same metrics. As part of this alignment step, the metrics definitions will have to be modified but we will likely provide a compatibility option to allow preserving compatibility with their current definitions.

SylvainJuge avatar Jan 02 '25 09:01 SylvainJuge

FYI the next steps to allow usage of jmx-scraper with jmxreceiver is further described in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/37469.

SylvainJuge avatar Jan 24 '25 15:01 SylvainJuge

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • receiver/jmxreceiver: @atoulme @rogercoll

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Mar 26 '25 03:03 github-actions[bot]

Update on current status

  • nothing has been done yet on the jmxreceiver in this repository
  • the JMX Gatherer remains as-is for now
  • the JMX scraper can now be used as a replacement of JMX Gatherer with metrics that are ~99% the same modulo some minor differences (few extra attributes, few attribute values without string case normalization)

An alternative to modifying the jmxreceiver implementation in this repo could be to make JMX Gatherer include a copy of JMX Scraper and silently delegate to it, however this is not ideal due to the minor metrics differences.

Also, we must provide support for a few new configuration options, so it is better to properly add support for JMX Scraper in collector:

  • ability to switch between "legacy" metrics definitions that are close to JMX Gatherer and the "new" that are defined directly in OpenTelemetry Java Instrumentation repository
  • ability to provide a custom yaml file to capture metrics that are not supported out of the box

SylvainJuge avatar Mar 26 '25 08:03 SylvainJuge

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • receiver/jmxreceiver: @atoulme @rogercoll

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar May 27 '25 03:05 github-actions[bot]

This issue is not stale, I hope we might get some progress on it soon.

As a kind of "last resort" option, if unable to make changes to the collector directly we could replace the underlying implementation of jmx-gatherer with jmx-scraper, but it would not help with deprecating jmx-gatherer long term.

SylvainJuge avatar May 27 '25 07:05 SylvainJuge

https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/40954 from @rogercoll has been merged which means it is now possible to use jmx-scraper with collector 🎉 .

It is still not the default, so it still requires to explicit opt-in configuration, however it allows evaluating the transition to scraper and future deprecation of gatherer.

The next steps on the collector side would be:

  • ask for feedback on using jmxreceiver with jmx-scraper which is now possible (maybe add this to the collector sig meeting agenda)
  • deal with any potential issues reported with it
  • make jmx-scraper the default option

SylvainJuge avatar Jul 17 '25 11:07 SylvainJuge

In order to help with the migration, I have added with migration instructions in https://github.com/open-telemetry/opentelemetry-java-contrib/pull/2034 (not merged as time of writing), linking to it could be relevant to help with future migration and changing the default implementation.

SylvainJuge avatar Jul 18 '25 12:07 SylvainJuge

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • receiver/jmxreceiver: @atoulme @rogercoll

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Sep 17 '25 03:09 github-actions[bot]

bump

breedx-splk avatar Sep 18 '25 23:09 breedx-splk