strimzi-kafka-operator icon indicating copy to clipboard operation
strimzi-kafka-operator copied to clipboard

Improve monitoring

Open tombentley opened this issue 6 years ago • 6 comments

Currently it is difficult to look inside a JVM running in a pod to understand things like garbage collection. The story is inconsistent between different images:

  • For kafka, zookeeper and kafka connect images we provide the JMX exporter, which if users install prometheus would give then visibility of JVM metrics.
  • Kafka, zookeeper and kafka connect images have gc logging enabled.
  • For TC and CC we provide essentially nothing.

From a development perspective it's a lot of faff to have to deploy prometheus and grafana just to see a graph of some metrics. It would be more useful to be able to use a tool like visualvm and kubectl port-forward. There are a couple of issues with doing that:

  • jstatd, which would give visualvm better insight into the JVM is part of the JDK, not the JRE, so it's difficult to arrange for that tool to be available in the images.

  • It's possible to use visualvm with JMX (but it's less capable than thru jstatd) by passing System properties to the java cmd:

      -Dcom.sun.management.jmxremote.port=9999 \
      -Dcom.sun.management.jmxremote.rmi.port=9999 \
      -Dcom.sun.management.jmxremote.local.only=false \
      -Dcom.sun.management.jmxremote.authenticate=false \
      -Dcom.sun.management.jmxremote.ssl=false \
      -Djava.rmi.server.hostname=127.0.0.1
    

    but this also requires the port to be EXPOSEd in the Dockerfile, and we probably should make this sufficiently configurable that it supports SSL and authentication.

Ideally whatever we do will be configured consistently across all the images.

tombentley avatar May 01 '18 09:05 tombentley

Triaged 22.2.2022: This can be configured using environment variables + the EXPOSE is not needed in the Dockerfile. All what is needed here is to describe it in the (developer) documentation.

scholzj avatar Feb 22 '22 15:02 scholzj

This would be a really useful feature. Does it trigger a rolling update? In that case, it couldn't be used with hard to reproduce issues.

fvaleri avatar Feb 22 '22 15:02 fvaleri

Yes, it requires rolling update if you enable it.

scholzj avatar Feb 22 '22 16:02 scholzj

@scholzj you wrote:

This can be configured using environment variables

I assume you meant in this case it'd be the environment variable KAFKA_JMX_OPTS, could you please elaborate what would be the correct way to set it?

alonpr avatar Oct 13 '22 01:10 alonpr

++ please elaborate

aidan-melen avatar Jan 13 '23 22:01 aidan-melen

++ here too.

f3r73ch avatar Jun 15 '23 14:06 f3r73ch