pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Minion Task to support automatic Segment Refresh

Open vvivekiyer opened this issue 4 months ago • 3 comments

Currently, when new columns are added or indexes are added/removed, the segment reloads happen on the server. There are a number of issues with this approach:

  1. Increased startup times for Pinot Server hosts. Servers have to reload segments (generating indexes, columns) everytime at server startup. This is particularly exacerbated for Upsert tables. cc: @tibrewalpratik17 @ankitsultana
  2. The server reload compute cost is paid on each server when indexes/colums are added. This leads to over-provisioning of servers to account for this compute cost.
  3. Reload on servers when queries are being processed affects latencies.
  4. Takes a long time to reload all segments (default value of 1 segment at a time). Increasing the concurrency affects query latencies.
  5. The segment on the deepstore never contains the new indexes/columns. So the segment in deepstore is at divergence from the server (making it not ideal for disaster recovery).

This PR creates a minion task to automatically refresh segments when there are index/column updates to table config/schema. It can support automatic refresh for the following operations:

  1. Adding/Removing indexes
  2. Adding columns
  3. Changing compatible datatypes.
  4. Converting segment versions

Followup Work:

  1. When there are table config/schema updates, we can validate if the datatype changes for columns are compatible. We can allow compatible updates.
  2. Schedule the SegmentRefresh tasks when there are tableconfig/schema updates rather than waiting for the next iteration of periodic job.

Tested using integration tests.

vvivekiyer avatar Oct 24 '24 21:10 vvivekiyer