kafka-ui
kafka-ui copied to clipboard
Consumers tab shows wrong lag value when messages are produced transactionally
Issue submitter TODO list
- [x] I've looked up my issue in FAQ
- [x] I've searched for an already existing issues here
- [x] I've tried running
main-labeled docker image and the issue still persists there - [x] I'm running a supported version of the application which is listed here
Describe the bug (actual behavior)
Hi Team,
Congrats for the great tool!
Not sure if what I am going to describe should be considered as a feature request or a bug. Happy to have this moved to a feature request.
We are observing wrong data being displayed in the Consumer Lag column in the Consumers tab, when messages are being produced transactionally.
The Kafka producer produces a special end-of-transaction message which marks the end of transaction offset. This message is not visible in the messages list and it is not consumed by consumers but it is counted in the lag.
This wrong lag will stay forever or until a producer produces a message without using a transaction.
All my best, Krum.
Expected behavior
The Consumer Lag column seen in the Consumers tab shows 0, if the consumer has processed all messages on its topic of interest.
Your installation details
App version we are using is v1.2.0. We are deploying its Docker image and running it as a container on AWS ECS.
Steps to reproduce
Target Kafka UI against a Kafka cluster whose version supports transactional messages.
Have a Kafka topic where some producer produces messages transactionally. Use X as a number of partitions for that topic.
Have the producers produce messages in a transaction. Make sure that consumers consume these messages with no issues.
Observe that the Consumer Lag column seen in the Consumers tab consistently shows values bigger than 0. The value could be as big as X, it could also drop to 0.
Screenshots
No response
Logs
No response
Additional context
No response
Hi krumft! 👋
Welcome, and thank you for opening your first issue in the repo!
Please wait for triaging by our maintainers.
As development is carried out in our spare time, you can support us by sponsoring our activities or even funding the development of specific issues. Sponsorship link
If you plan to raise a PR for this issue, please take a look at our contributing guide.
I am not entirely certain that this is a bug, but more Kafka as intended. Although the consumer ignores the messages because it respects the transactional status, the messages are actually there and the consumer has not committed having read them: thus there is consumer lag.
Upon reread, is it just the transaction closing message that is adding to the lag? What happens when you start a new transaction? Will it step over the "missed" message?
I am not entirely certain that this is a bug, but more Kafka as intended. Although the consumer ignores the messages because it respects the transactional status, the messages are actually there and the consumer has not committed having read them: thus there is consumer lag.
Yes, I agree with you.
Upon reread, is it just the transaction closing message that is adding to the lag?
Yes, the marker message is the culprit, leading to lag of 1. That happens for each partition on which you have a transaction.
What happens when you start a new transaction? Will it step over the "missed" message?
Yes, kind of :)
It is fine for you to ignore this one, it is hardly a real bug. It more looks like a challenge of how to represent a meaningful UX/UI for these little transactional quirks. Maybe the best solution is the simplest: do nothing.
From the other perspective, if the tool behaved consistently, it should be possible to see the marker messages in the Messages tab. This is currently not possible. For example, here is what I see in the lag details: the consumer lags on partition 95 of the topic, having read and committed message 6198 while the latest offset is 6199.
At the same time, looking at Messages the latest offset I can see is 6197.
Yeah, I see what you mean. We have a similar issue with a producer always opening a transaction "just in case" it wants to produce at some point and then that one meta-message showing as lag in our metrics/the messages tab.
But I can see how this might be a good feature request. Maybe something like "transaction opening marker" and "closing marker". (Perhaps something you could toggle on or off.)
Yeah, as transactions are implemented as some protocols/abstractions on top of messages, offsets, etc., it becomes a challenge for the tool to decide whether to show the raw artifacts, or to respect the higher-level abstractions.
@krumft, thanks for raising the issue. @Masqueey, appreciate the clarification. Yes, you're absolutely right — there are "system" messages within the topic that are hidden in the UI because the consumer doesn’t display them. Additionally, the consumer only commits offsets for regular messages. This could definitely be confusing for users who aren't aware of this internal behavior.
From a UI perspective, what do you think would be the best approach to address or clarify this?
This is a hard question :)
Perhaps my personal choice would go the way suggested by @Masqueey : just like we already have a Show Internal Topics toggle, we could probably have a similar toggle for showing/hiding transactional messages. The challenge here is that this new toggle affects the UI of at least two different pages: Topics (as in, the messages on a given topic) and Consumers (the lag for a given consumer group). So it must be visible/accessible from both of these pages.
It would also be nice to check how the competition is approaching this challenge.
Thanks so much for the discussion.
@krumft From what I've seen, competitors haven't addressed this issue either — so we might be the first to solve it!
Here's what's on my mind:
In the Consumer Lag tab, we could add a checkbox to exclude transactional messages. If selected, we’d need to filter message types accordingly during the offset request.
For the Messages tab, I suggest we leave it as is, since there’s no API available to fetch those messages via the Consumer API.
From what I've seen, competitors haven't addressed this issue either — so we might be the first to solve it!
Great, thanks for checking.
In the Consumer Lag tab, we could add a checkbox to exclude transactional messages. If selected, we’d need to filter message types accordingly during the offset request.
Makes sense to me.
For the Messages tab, I suggest we leave it as is, since there’s no API available to fetch those messages via the Consumer API.
Oh, I see. So in this case we're quite restricted in what we could do here. This is a strange asymmetry in Kafka's own APIs: you cannot consume control messages, yet they are part of the reported lag. Would it make sense to ping the Kafka team about that?
To apply some common sense and critical thinking: by implementing this feature are we sure we're not interfering/coupling the tool too much with some internal Kafka decisions that might change in some later version? The decision to implement transactions with control messages is an internal decision of Kafka, and that could change at some point.
I/We still need to switch from Kafka UI to Kafbat UI, but I already wanted to chip in 🙂
When a transaction is rolled back, this causes additional “virtual” consumer lag because the messages of the transaction + the marker of the last successful transaction will also be considered as unconsumed (so for example a lag of 3 if there was 1 rolled-back message and the previous transaction was successful). It seems rollbacks should thus also be taken into account.
Regarding the messages tab, if those markers can’t be displayed, it might still be useful to have a toggle to hide uncommitted messages – which can be because of ongoing transactions or rolled-back ones.
For the record, I can also confirm that Confluent has the same issue with consumer lag.