micrometer icon indicating copy to clipboard operation
micrometer copied to clipboard

Monitor Kafka Streams application state out of the box

Open pszymczyk opened this issue 9 months ago • 9 comments

Feature request

I would like to monitor my Kafka Streams application state without without extra programming effort.

Rationale Monitoring Kafka Streams application state is one of the basis metrics for operating Kafka Streams applications. When application is going to Re-Balancing state too often or stuck in Error state this is what really matters in Kafka Streams observability.

Additional context

Kafka Streams is by default reporting application state on top level client metrics. Library is reporting enum values:

Image

Because of reported value is not numeric, Micrometer is filtering this metric out in KafkaMetrics meter binder.

Image

Proposed solution As every Kafka Stream state has also standard numeric representation we can translate enum -> double and report Kafka Streams application state out of the box. We can extend present KafkaMetrics meter binder implementation by adding there if statement for state metric and register gauge:

return stateMetric -> {
            KafkaStreams.State state = (KafkaStreams.State) stateMetric.metricValue();

            return switch (state) {
                case CREATED -> 0;
                case RUNNING -> 2;
                case REBALANCING -> 1;
                case PENDING_SHUTDOWN -> 3;
                case NOT_RUNNING -> 4;
                case ERROR -> 5;
                default -> -1;
            };
        };

Additional resources

When you find this feature useful I would like to provide an implementation implementation.

pszymczyk avatar Apr 03 '25 09:04 pszymczyk

Any update here?

pszymczyk avatar May 08 '25 17:05 pszymczyk

Thank you for the issue! I'm not a huge fan of mapping an enum to numerical values especially since that mapping is not public in Kafka but I guess there is nothing better we can do to support this.

Please feel free to create a PR. /cc @shakuzen

jonatan-ivanov avatar May 09 '25 00:05 jonatan-ivanov

Essentially what we need for this is what OpenMetrics defines as a StateSet. We could I think do something similar with a MultiGauge, perhaps. I would avoid magic numbers.

Taking a step back, how do you anticipate using this metric in your metrics backend or in alerts or charts? That might help better inform what makes sense or is most usable.

shakuzen avatar May 09 '25 06:05 shakuzen

@shakuzen I provided sample dashboard in issue description, have you looked here: https://www.responsive.dev/blog/monitoring-kafka-streams#bonus ?

Image

In my present setup I have 3 things:

  1. Histogram similar to what I have attached
  2. Metric component presenting present Kafka Streams state with color mapping -> 2nd state is green, 5th state is red and so on
  3. Alerting based on present state.

pszymczyk avatar May 15 '25 09:05 pszymczyk

@shakuzen I have some spare time now and I'd like to work on this issue, I am not sure how to avoid magic numbers and introduce StateSet, any advice is welcome :)

pszymczyk avatar Oct 02 '25 19:10 pszymczyk

I am not sure how to avoid magic numbers

A Kafka Stream can only be in one state at a time, so you can have a time series for each possible state via a tag identifying the state (e.g. state=CREATED) and only the state it is in will have a value of 1 while all other states will have a value of 0. That is how OpenMetrics StateSet works.

introduce StateSet

That's a harder task because I don't think other metrics formats have the concept of StateSet, so I'm not sure it makes sense to introduce it as a top-level concept in Micrometer. Although we could publish it as a gauge type in formats that don't have a StateSet, since it is just a specialized gauge type. As I mentioned, you can simulate it with MultiGauge without introducing a new Meter type. I wrote the following test demonstrating it:

@Test
void enumMultiGauge() {
    State currentState = State.CREATED;
    MultiGauge kafkaStreamsState = MultiGauge.builder("kafka.streams.state").tag("client.name", "client1").register(registry);

    kafkaStreamsState.register(Arrays.stream(State.values()).map(state -> Row.of(Tags.of("state", state.name()), state == currentState ? 1 : 0)).collect(Collectors.toList()));

    System.out.println(((SimpleMeterRegistry) registry).getMetersAsString());
}

private enum State {
    CREATED, RUNNING, NOT_RUNNING;
}

This has the output:

kafka.streams.state(GAUGE)[client.name='client1', state='CREATED']; value=1.0
kafka.streams.state(GAUGE)[client.name='client1', state='NOT_RUNNING']; value=0.0
kafka.streams.state(GAUGE)[client.name='client1', state='RUNNING']; value=0.0

Or in OpenMetrics format:

# TYPE kafka_streams_state gauge
# HELP kafka_streams_state  
kafka_streams_state{client_name="client1",state="CREATED"} 1.0
kafka_streams_state{client_name="client1",state="NOT_RUNNING"} 0.0
kafka_streams_state{client_name="client1",state="RUNNING"} 0.0
# EOF

Maybe there is some API we could add to make it easier to do this with a given enum and current value. Or we could consider adding StateSet, but this is I believe the first time it is being requested. We would generally want to wait for more use cases and demand before adding something like that.

shakuzen avatar Oct 07 '25 06:10 shakuzen

What about reporting simple numeric values as in the KafkaStreams.State documentation we have an official mapping:

Image

I like this MultiGauge approach because we can see actual state name, but I am not sure hard will it be to visualize this metric, for example in Grafana. Numbers are really straightforward.

pszymczyk avatar Oct 18 '25 19:10 pszymczyk

The "official" mapping is not part of the public API of Kafka client. You cannot get it from the Kafka client, it seems to be an implementation detail we should not depend on (not sure why it is in the javadoc). Also, it's kind of obscure when you see current.state = 5, how one supposed to know the meaning of it?

The output that Tommy suggested above can be used to create dashboards and visualize state transitions. If you plot just the metric without filtering any tags you should see which is the current state. Grafana also supports enums (including transformations) and state timelines, please check if it works for you.

jonatan-ivanov avatar Oct 20 '25 19:10 jonatan-ivanov

@jonatan-ivanov makes sense 👍 . Thank you guys for support, now everything is clear and I have no doubts about how this should be implemented. I will try to provide a PR with implementation soon :)

pszymczyk avatar Oct 20 '25 19:10 pszymczyk