pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Allowing empty segments with no offset advancing

Open jadami10 opened this issue 1 year ago • 4 comments

This is a followup from https://github.com/apache/pinot/issues/8929, but in the case of 0 data being consumed. We've since found a poor interaction between Pinot and our s3 lifecycling.

  • we have a few partitions that have 0 data all the time (not filtered out, literally 0 events)
  • Pinot keeps that consuming segment open indefinitely (in this case we noticed it had been 1 year)
  • Pinot also keeps the last completed segment for each partition (in this case, it's from 2023)
  • We wanted to use s3 lifecycling to delete all data < 30 days old (our tables had 10 day retention) rather than just rely on Pinot's mechanisms, but this sent segments into error state since they had been around for 1 year

Is there any reason we can't seal a segment where the offset hasn't advanced? In this case, we would have had N segments for this partition all with 0 records and the same start/end offset.

cc @Jackie-Jiang @priyen-stripe

jadami10 avatar Mar 22 '24 15:03 jadami10

I think we don't seal them right now because we don't support empty segment before. Since we can support empty segment now, we should be able to seal them. We want to revisit the timestamp used for empty segment (using current time should work) so that retention manager can remove them properly.

Jackie-Jiang avatar Mar 22 '24 17:03 Jackie-Jiang