pinot
pinot copied to clipboard
[colocated-join] Add Support for Deterministic Segment Assignment Strategy
Will use this PR to discuss some questions offline and update details here afterwards.
Codecov Report
Merging #9199 (7f79c90) into master (802c596) will decrease coverage by
0.02%
. The diff coverage is54.54%
.
:exclamation: Current head 7f79c90 differs from pull request most recent head 8ada81e. Consider uploading reports for the commit 8ada81e to get more accurate results
@@ Coverage Diff @@
## master #9199 +/- ##
============================================
- Coverage 68.74% 68.71% -0.03%
+ Complexity 4997 4755 -242
============================================
Files 1852 1852
Lines 98848 98877 +29
Branches 15039 15048 +9
============================================
- Hits 67951 67944 -7
- Misses 26044 26087 +43
+ Partials 4853 4846 -7
Flag | Coverage Δ | |
---|---|---|
integration1 | 26.27% <6.06%> (-0.07%) |
:arrow_down: |
unittests1 | 67.13% <ø> (+0.04%) |
:arrow_up: |
unittests2 | 15.27% <54.54%> (-0.03%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
.../assignment/segment/RealtimeSegmentAssignment.java | 89.28% <25.00%> (-5.02%) |
:arrow_down: |
...e/assignment/segment/OfflineSegmentAssignment.java | 87.30% <33.33%> (-4.30%) |
:arrow_down: |
...ore/assignment/segment/SegmentAssignmentUtils.java | 99.42% <100.00%> (+0.04%) |
:arrow_up: |
...data/manager/realtime/DefaultSegmentCommitter.java | 0.00% <0.00%> (-80.00%) |
:arrow_down: |
...a/manager/realtime/RealtimeSegmentDataManager.java | 75.00% <0.00%> (-25.00%) |
:arrow_down: |
...er/api/resources/LLCSegmentCompletionHandlers.java | 43.56% <0.00%> (-18.82%) |
:arrow_down: |
...data/manager/realtime/SegmentCommitterFactory.java | 70.58% <0.00%> (-11.77%) |
:arrow_down: |
.../filter/predicate/InPredicateEvaluatorFactory.java | 72.80% <0.00%> (-9.61%) |
:arrow_down: |
...inot/core/util/SegmentCompletionProtocolUtils.java | 57.69% <0.00%> (-7.70%) |
:arrow_down: |
...n/java/org/apache/pinot/common/utils/URIUtils.java | 66.66% <0.00%> (-7.41%) |
:arrow_down: |
... and 33 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
This has the same behavior as setting
numInstancesPerPartition
to 1, andnumPartitions
to the partitions for the table. Currently we don't support explicitly setting partitions forRealtimeSegmentAssignment
, so we should fix that instead of introducing this new strategy
I feel this is not exactly the same as numInstancesPerPartition=1
, instead numInstancesPerPartition
shall be the same as the number of replicas. In fact, the strategy is somewhat deterministic as the added stragy name suggests. And the main purpose is to make the assignment declarative and reduce the operational cost, e.g. we dont need to update numInstancesPerPartition
when we increase numPartitions
to scale out the table etc
This has the same behavior as setting
numInstancesPerPartition
to 1, andnumPartitions
to the partitions for the table. Currently we don't support explicitly setting partitions forRealtimeSegmentAssignment
, so we should fix that instead of introducing this new strategyI feel this is not exactly the same as
numInstancesPerPartition=1
, insteadnumInstancesPerPartition
shall be the same as the number of replicas. In fact, the strategy is somewhat deterministic as the added stragy name suggests. And the main purpose is to make the assignment declarative and reduce the operational cost, e.g. we dont need to updatenumInstancesPerPartition
when we increasenumPartitions
to scale out the table etc
This new strategy is trying to put all segments from the same segment partition to the same server, which is exactly the same as what numInstancesPerPartition = 1
is doing (and it is also deterministic).
In order to achieve this, you should use the partition based replica-group assignment, and always put numInstancesPerPartition = 1
. You may increase numPartitions
if needed (should be very rare because it should be the same as table partitions, and changing it requires re-partitioning all the segments). Noted that numPartitions
can be larger than the total instances within a replica-group. E.g. with 9 servers, you can have 3 replica-groups (3 servers per replica), then 18 partitions, and each server will host 6 partitions. You may scale it up to 18 servers, and run a rebalance without changing numInstancesPerPartition
or numPartitions
. After rebalancing, each server will host 3 partitions.
I see whaat you mean. Then that's a reasonable suggestion. But I hope we can find a way to enforce this as it's not so easy for users to understand the details to setup colocation correctly. I feel either some alias, or some validation in the table config is needed.
@yupeng9 True. I discussed this with @ankitsultana and suggested that we may introduce a flag to mark colocated tables, then we can check the table config to enforce numInstancesPerPartition=1
Currently we don't support explicitly setting partitions for RealtimeSegmentAssignment, so we should fix that instead of introducing this new strategy
Yeah I'll start working on it. I am not so sure about using numInstancesPerPartition=1, but at the same time I think I need to take a holistic look at the colocation setup once we have done some PoCs internally (ETA: 2 weeks from now).
Edit: Looks like you already added it: https://github.com/apache/pinot/commit/6c5d85ef7eeffcd4cddd639865b81764fb04dbf1. That's great