pinot
pinot copied to clipboard
Minion Task to support automatic Segment Refresh
Currently, when new columns are added or indexes are added/removed, the segment reloads happen on the server. There are a number of issues with this approach:
- Increased startup times for Pinot Server hosts. Servers have to reload segments (generating indexes, columns) everytime at server startup. This is particularly exacerbated for Upsert tables. cc: @tibrewalpratik17 @ankitsultana
- The server reload compute cost is paid on each server when indexes/colums are added. This leads to over-provisioning of servers to account for this compute cost.
- Reload on servers when queries are being processed affects latencies.
- Takes a long time to reload all segments (default value of 1 segment at a time). Increasing the concurrency affects query latencies.
- The segment on the deepstore never contains the new indexes/columns. So the segment in deepstore is at divergence from the server (making it not ideal for disaster recovery).
This PR creates a minion task to automatically refresh segments when there are index/column updates to table config/schema. It can support automatic refresh for the following operations:
- Adding/Removing indexes
- Adding columns
- Changing compatible datatypes.
- Converting segment versions
Followup Work:
- When there are table config/schema updates, we can validate if the datatype changes for columns are compatible. We can allow compatible updates.
- Schedule the SegmentRefresh tasks when there are tableconfig/schema updates rather than waiting for the next iteration of periodic job.
Tested using integration tests.