Monitor Kafka Streams application state out of the box
Feature request
I would like to monitor my Kafka Streams application state without without extra programming effort.
Rationale Monitoring Kafka Streams application state is one of the basis metrics for operating Kafka Streams applications. When application is going to Re-Balancing state too often or stuck in Error state this is what really matters in Kafka Streams observability.
Additional context
Kafka Streams is by default reporting application state on top level client metrics. Library is reporting enum values:
Because of reported value is not numeric, Micrometer is filtering this metric out in KafkaMetrics meter binder.
Proposed solution As every Kafka Stream state has also standard numeric representation we can translate enum -> double and report Kafka Streams application state out of the box. We can extend present KafkaMetrics meter binder implementation by adding there if statement for state metric and register gauge:
return stateMetric -> {
KafkaStreams.State state = (KafkaStreams.State) stateMetric.metricValue();
return switch (state) {
case CREATED -> 0;
case RUNNING -> 2;
case REBALANCING -> 1;
case PENDING_SHUTDOWN -> 3;
case NOT_RUNNING -> 4;
case ERROR -> 5;
default -> -1;
};
};
Additional resources
When you find this feature useful I would like to provide an implementation implementation.
Any update here?
Thank you for the issue! I'm not a huge fan of mapping an enum to numerical values especially since that mapping is not public in Kafka but I guess there is nothing better we can do to support this.
Please feel free to create a PR. /cc @shakuzen
Essentially what we need for this is what OpenMetrics defines as a StateSet. We could I think do something similar with a MultiGauge, perhaps. I would avoid magic numbers.
Taking a step back, how do you anticipate using this metric in your metrics backend or in alerts or charts? That might help better inform what makes sense or is most usable.
@shakuzen I provided sample dashboard in issue description, have you looked here: https://www.responsive.dev/blog/monitoring-kafka-streams#bonus ?
In my present setup I have 3 things:
- Histogram similar to what I have attached
- Metric component presenting present Kafka Streams state with color mapping -> 2nd state is green, 5th state is red and so on
- Alerting based on present state.
@shakuzen I have some spare time now and I'd like to work on this issue, I am not sure how to avoid magic numbers and introduce StateSet, any advice is welcome :)
I am not sure how to avoid magic numbers
A Kafka Stream can only be in one state at a time, so you can have a time series for each possible state via a tag identifying the state (e.g. state=CREATED) and only the state it is in will have a value of 1 while all other states will have a value of 0. That is how OpenMetrics StateSet works.
introduce StateSet
That's a harder task because I don't think other metrics formats have the concept of StateSet, so I'm not sure it makes sense to introduce it as a top-level concept in Micrometer. Although we could publish it as a gauge type in formats that don't have a StateSet, since it is just a specialized gauge type. As I mentioned, you can simulate it with MultiGauge without introducing a new Meter type. I wrote the following test demonstrating it:
@Test
void enumMultiGauge() {
State currentState = State.CREATED;
MultiGauge kafkaStreamsState = MultiGauge.builder("kafka.streams.state").tag("client.name", "client1").register(registry);
kafkaStreamsState.register(Arrays.stream(State.values()).map(state -> Row.of(Tags.of("state", state.name()), state == currentState ? 1 : 0)).collect(Collectors.toList()));
System.out.println(((SimpleMeterRegistry) registry).getMetersAsString());
}
private enum State {
CREATED, RUNNING, NOT_RUNNING;
}
This has the output:
kafka.streams.state(GAUGE)[client.name='client1', state='CREATED']; value=1.0
kafka.streams.state(GAUGE)[client.name='client1', state='NOT_RUNNING']; value=0.0
kafka.streams.state(GAUGE)[client.name='client1', state='RUNNING']; value=0.0
Or in OpenMetrics format:
# TYPE kafka_streams_state gauge
# HELP kafka_streams_state
kafka_streams_state{client_name="client1",state="CREATED"} 1.0
kafka_streams_state{client_name="client1",state="NOT_RUNNING"} 0.0
kafka_streams_state{client_name="client1",state="RUNNING"} 0.0
# EOF
Maybe there is some API we could add to make it easier to do this with a given enum and current value. Or we could consider adding StateSet, but this is I believe the first time it is being requested. We would generally want to wait for more use cases and demand before adding something like that.
What about reporting simple numeric values as in the KafkaStreams.State documentation we have an official mapping:
I like this MultiGauge approach because we can see actual state name, but I am not sure hard will it be to visualize this metric, for example in Grafana. Numbers are really straightforward.
The "official" mapping is not part of the public API of Kafka client. You cannot get it from the Kafka client, it seems to be an implementation detail we should not depend on (not sure why it is in the javadoc). Also, it's kind of obscure when you see current.state = 5, how one supposed to know the meaning of it?
The output that Tommy suggested above can be used to create dashboards and visualize state transitions. If you plot just the metric without filtering any tags you should see which is the current state. Grafana also supports enums (including transformations) and state timelines, please check if it works for you.
@jonatan-ivanov makes sense 👍 . Thank you guys for support, now everything is clear and I have no doubts about how this should be implemented. I will try to provide a PR with implementation soon :)