opentelemetry-java-contrib icon indicating copy to clipboard operation
opentelemetry-java-contrib copied to clipboard

Refactor & merge JMX gatherer & insight implementations

Open SylvainJuge opened this issue 1 year ago • 13 comments

As discussed in June 20th Java SIG meeting, we have two distinct ways to capture JMX metrics:

  • with instrumentation agent through the JMX insight feature
  • with contrib through the JMX gatherer (in this repository)

As a consequence, we have:

  • JMX gatherer is using groovy scripts for metrics definition
  • JMX insight is using YAML files for metrics definition
  • both JMX insight and JMX gatherer allow to provide custom user-provided configuration using respectively a groovy script and a yaml file
  • JMX gatherer provides ready-to-use metrics configuration for some target systems (tomcat, kafka, cassandra, ...)
  • JMX insight provides ready-to-use metrics configuration for a few of the target systems supported by JMX gatherer.

In addition to the duplication of code, the metrics definition themselves are duplicated in two different incompatible formats (YAML and Groovy scripts), which makes the maintenance even harder and further complicates their evolution. For example trying to change definitions in instrumentation like in opentelemetry-java-instrumentation#11621 becomes more complicated than it should.

JMX gatherer allows to capture JMX metrics from outside of the JVM and is used in the collector-contrib repository. Usages of it can be found by searching for usages of otel.jmx.service.url on GH.

JMX gatherer is used in the opentelemetry-collector-contrib repository by the jmxreceiver

Proposal

  • reuse YAML-based implementation of JMX insight (instrumentation) for JMX gatherer implementation
  • replace target system metrics definition with YAML equivalents
  • drop support for user-provided groovy scripts, they would have to be replaced by YAML equivalent
  • for now keep the metrics definitions identical to their current state to preserve compatibility
  • aligning any metrics definition differences (if there is any) would be handled through follow-up updates to the YAML definitions.

Removing the user-provided groovy script support is the only breaking change here, the rationale is:

  • it is still possible (and easier) to capture custom metrics with YAML, so the feature is preserved
  • maintenance of YAML files is expected to be simpler and allows to keep insight/gatherer in sync
  • usage of otel.jmx.groovy.script in opentelemetry-collector-contrib was removed 2 years ago for security reasons (opentelemetry-collector-contrib#6750 and opentelemetry-collector-contrib#9721)
  • searching for references to otel.jmx.groovy.script on GH returns mostly point to old forks of the opentelemetry-collector-contrib, so we can infer this feature is probably rarely used

All of this assumes that there are no limitations of the YAML metrics definition to represent the current groovy script definitions. In case of unsolvable incompatibilities we will have to decide on a case-by-case basis.


Implementation steps

The current path to implement this brings a few breaking changes so we decided to create a new "JMX Scraper" component that will provide an alternative to "JMX Gatherer" for an easier migration.

  • [x] bootstrap a new JMX Scraper module: #1445
  • [x] allow reusing JMX Insights with remote connections https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/12178 (part of instrumentation 2.9.0 release).
  • [x] implement & test SSL connection: server certificate + SSL RMI registry
  • [x] test SSL connection: client certificate
  • [x] implement and test for a single target system tomcat with definitions that match tomcat.groovy https://github.com/open-telemetry/opentelemetry-java-contrib/pull/1485
  • [x] publish and document JMX Scraper "MVP" for feedback and test https://github.com/open-telemetry/opentelemetry-java-contrib/pull/1586
  • [x] maybe add ability to aggregate some mbean metrics to replicate https://github.com/open-telemetry/opentelemetry-java-contrib/pull/1366 in contrib, however this might also be covered by better metrics definition that use MBean key/values.
  • [x] implement and test other target systems
    • [x] activemq #1497
    • [x] cassandra #1515
    • [x] hadoop #1675
    • [x] hbase #1538
    • [x] jetty #1517
    • [x] jvm #1485
    • [x] kafka-consumer #1670
    • [x] kafka-producer #1670
    • [x] kafka #1670
    • [x] solr #1595
    • [x] tomcat #1485
    • [x] wildfly #1531
  • [x] document usage and configuration #1651
  • [ ] make otel collector jmx receiver able to use it https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/37469
  • [x] refactor assertions to provide easy debugging of test failures
  • [x] refactor configure to use autoconfiguration (support for env variables, yaml config, ...) #1651

SylvainJuge avatar Jul 03 '24 13:07 SylvainJuge