pinot
pinot copied to clipboard
[WIP][NOT_FOR_REVIEW] Expose consuming segment's offset lag through /consumingSegmentsInfo API
A common user-request is to understand if there is any lag in the consuming segment of a realtime table. While lag calculation varies significantly across streaming sources, this PR proposes an approach to surface this information in a generic manner.
The PR demonstrates offset-lag computation for Kafka. It can be extended to show more useful lag information : such as last consumed message timestamp (event time), ingestion rate etc. It can also be extended to other streaming systems that may support different lag calculation mechanisms.
Open questions for discussion:
- Does lag information require a new end-point or should we add to the existing
/consumingSegmentsInfo? - Currently, the
/consumingSegmentsInfoonly returns consumer lag information for active consuming segments. Would it be useful to fetch lag information even when the segment is not actively consuming (say, on a paused table?)
@navina @mcvsubbu Is this blocked on something? Some pinot users are asking for this metric .
@navina @mcvsubbu Is this blocked on something? Some pinot users are asking for this metric .
It's not blocked. Haven't been able to circle back to this. Will try to pick it up in the next week.
This PR, while still a WIP, is intended to provide a consumer lag to the end user for at least the kafka topic (as that is the most common ask and something Pinot consumer lacks with respect a confluent consumer).
Taking this up!
Closing this in favor of https://github.com/apache/pinot/pull/9515