pinot icon indicating copy to clipboard operation
pinot copied to clipboard

SegmentDeletionManager assumes segment is directly under table prefix in deep store

Open dd-willgan opened this issue 1 year ago • 3 comments

Hi Pinot team, recently my company came across an issue where we realized that expired segments were not being deleted from the deep store. The reason for this we realized is that Pinot assumes the data is directly under the deep store directory for the given table here but in our case the segments were actually uploaded to subdirectories within the table directory e.g. <dataDir>/<rawTableName>/<partition>/<segment>. Is it possible to try deleting the URI from the segment ZK metadata as a fallback?

dd-willgan avatar Sep 30 '24 18:09 dd-willgan

Trying to get more context here. Do you use metadata push to upload segments? I think the underlying implication here is that if the data is purposely put in a separate directory, pinot doesn't delete them in case user wants to keep them around. But I guess we may introduce a config for pinot to not delete the file in deep store (by default false)

Jackie-Jiang avatar Sep 30 '24 23:09 Jackie-Jiang

Hey @Jackie-Jiang yes SegmentMetadataPushJobRunner. I see, yes I would be okay with adding a flag to control this behavior, maybe something like controller.segment.delete.useStoredUri

dd-willgan avatar Oct 01 '24 00:10 dd-willgan

cc @swaminathanmanish

Jackie-Jiang avatar Oct 05 '24 06:10 Jackie-Jiang