pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Fix bug when evaluating resource status during service startup check

Open itschrispeck opened this issue 1 year ago • 1 comments

We have seen servers sometimes fail to pass the service status checker until the timeout is reached, even after all segments are online/in the expected state. Logs show:

Sleep for 10000ms as service status has not turned GOOD: MultipleCallbackServiceStatusCallback:IdealStateAndCurrentStateMatchServiceStatusCallback:Helix state does not exist, waitingFor=CurrentStateMatch, resource=table_REALTIME, numResourcesLeft=2, numTotalResources=802, minStartCount=802,;IdealStateAndExternalViewMatchServiceStatusCallback:Init;;

This is due to this check, which considers the table resource to have STARTING status if the external view/current state is null and ideal state is not. However, this isn't a valid assumption since the current state can be null if the last segment on the server is removed and the ideal state still exists. We primary see this behavior with completed segment redistribution turned on, on small tables.

The change here is meant to allow the resource status to return GOOD if the instance is no longer assigned any segment (when the server first started and collected all resources to monitor it was assigned).

itschrispeck avatar Jul 04 '24 09:07 itschrispeck

Codecov Report

Attention: Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.

Project coverage is 61.96%. Comparing base (59551e4) to head (651b64c). Report is 2306 commits behind head on master.

Files with missing lines Patch % Lines
...a/org/apache/pinot/common/utils/ServiceStatus.java 92.30% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13541      +/-   ##
============================================
+ Coverage     61.75%   61.96%   +0.21%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2554     +118     
  Lines        133233   140563    +7330     
  Branches      20636    21870    +1234     
============================================
+ Hits          82274    87096    +4822     
- Misses        44911    46838    +1927     
- Partials       6048     6629     +581     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) :arrow_down:
integration <0.01% <0.00%> (-0.01%) :arrow_down:
integration1 <0.01% <0.00%> (-0.01%) :arrow_down:
integration2 0.00% <0.00%> (ø)
java-11 61.91% <92.30%> (+0.20%) :arrow_up:
java-21 61.83% <92.30%> (+0.21%) :arrow_up:
skip-bytebuffers-false 61.94% <92.30%> (+0.19%) :arrow_up:
skip-bytebuffers-true 61.80% <92.30%> (+34.07%) :arrow_up:
temurin 61.96% <92.30%> (+0.21%) :arrow_up:
unittests 61.95% <92.30%> (+0.21%) :arrow_up:
unittests1 46.47% <92.30%> (-0.43%) :arrow_down:
unittests2 27.71% <0.00%> (-0.02%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Jul 04 '24 09:07 codecov-commenter

I don't fully follow the change. When there is no partition assigned to the given instance, the loop should all be skipped, and it should return GOOD right?

Not sure I understand, are you referring to this loop? If so, the if condition linked in the PR description shows that it will return STARTING status if the current state is null

Or if you mean the outer loop, the issue is that _resourceIterator collected all assigned resources when the server first started up. The resources being iterated through are not necessarily still assigned to the server, and should be re-evaluated

itschrispeck avatar Jul 16 '24 21:07 itschrispeck

My bad, now I see the problem. Should we consider first looping over the ideal state and create a map from serving partition to state, then only loop over this map in the second loop? Similar to the current approach but without the early termination in the first loop.

Jackie-Jiang avatar Jul 17 '24 00:07 Jackie-Jiang

My bad, now I see the problem. Should we consider first looping over the ideal state and create a map from serving partition to state, then only loop over this map in the second loop? Similar to the current approach but without the early termination in the first loop.

Yeah, I think that is clearer. Updated to use idealState.getRecord().getMapFields() as well

itschrispeck avatar Jul 24 '24 23:07 itschrispeck