OpenSearch
OpenSearch copied to clipboard
Fix getting replication type in NodeVersionAllocationDecider
Description
This PR fix incorrect way of getting replication type from node settings in org.opensearch.cluster.routing.allocation.decider.NodeVersionAllocationDecider. Instead, we should get the replication type from index meta data. Besides, I add a test which verifies that the primary shard can be allocated to a node with higher version when replication type is document.
Related Issues
Resolves #12744
Check List
- [ ] New functionality includes testing.
- [ ] All tests pass
- [ ] New functionality has been documented.
- [ ] New functionality has javadoc added
- [ ] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
- [ ] Commits are signed per the DCO using --signoff
- [ ] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
- [ ] Public documentation issue/PR created
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
:x: Gradle check result for d1b71d2beb6001bc0576005ea00c848aeb22509c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Compatibility status:
Checks if related components are compatible with change 5c30a74
Incompatible components
Skipped components
Compatible components
Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/sql.git]
:white_check_mark: Gradle check result for 47434ca3cc574fbeb29d5a6b2fc2b1a575adc794: SUCCESS
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 71.36%. Comparing base (
b15cb0c
) to head (5c30a74
). Report is 627 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #12811 +/- ##
============================================
- Coverage 71.42% 71.36% -0.06%
- Complexity 59978 60202 +224
============================================
Files 4985 5011 +26
Lines 282275 283557 +1282
Branches 40946 41089 +143
============================================
+ Hits 201603 202373 +770
- Misses 63999 64407 +408
- Partials 16673 16777 +104
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:grey_exclamation: Gradle check result for 5c30a74456e6743282f3af98ef950964ddc64f19: UNSTABLE
- TEST FAILURES:
1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testRestartPrimary_NoReplicas
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
Thanks for submitting the PR. I have a couple of questions:
- What is the expected behaviour? Is it that we should be able to move replicas between "SEgment replication" nodes to "Document replication"?
- Is there an forward-backward compatiability issues to deploy this fix in production?
Thanks for submitting the PR. I have a couple of questions:
- What is the expected behaviour? Is it that we should be able to move replicas between "SEgment replication" nodes to "Document replication"?
- Is there an forward-backward compatiability issues to deploy this fix in production?
- the expected behavior is that the primary shard of segment-replication can not be allocated to a node with higher version than the node which the replica shard is on, while the primary shard of document-replication can be allocated to a node with higher version. As the replication type setting is a index-scope setting, there is not any "Segment replication" nodes or "Document replication" node. There is only "Segment replication" index or "Document replication" index.
- I don't think there is any forward-backward compatibility issues to deploy this fix in production.
This PR is stalled because it has been open for 30 days with no activity.
[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 11 12 13]
@KunjueYu Thanks for opening this PR. Please add a release target label and double check if this is actually related to Remote Store, else remove the label
[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 11 12 13]
@KunjueYu Thanks for opening this PR. Please add a release target label and double check if this is actually related to Remote Store, else remove the label
I don't have the permission to edit the labels of this PR. This PR is not related to Remote Store, so the label should be removed. I am not familiar with choosing the release target label, maybe label v2.13.1 can be added ?
[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 11 12 13] @KunjueYu Thanks for opening this PR. Please add a release target label and double check if this is actually related to Remote Store, else remove the label
I don't have the permission to edit the labels of this PR. This PR is not related to Remote Store, so the label should be removed. I am not familiar with choosing the release target label, maybe label v2.13.1 can be added ?
Removed the storage label. If you are targeting v2.15 release(next release), we can add the 2.15 label
This PR is stalled because it has been open for 30 days with no activity.
This PR is stalled because it has been open for 30 days with no activity.
@KunjueYu can you check @gaobinlong 's comments and help take the PR to closure