fdb-kubernetes-operator
fdb-kubernetes-operator copied to clipboard
Add Localities to FoundationDBClusterSpec
Description
This PR implements the first step towards supporting three data hall configuration in the operator.
As discussed in: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/348
Type of change
Please select one of the options below.
New feature (non-breaking change which adds functionality)
Discussion
The changes in this PR are based on https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/design/three_datahall.md
Testing
Please describe the tests that you ran to verify your changes. Unit tests? Manual testing?
docker run --rm --entrypoint=/bin/bash -ti --platform="linux/amd64" -v $(pwd):/work -w /work docker.io/library/golang:1.18.5 ./scripts/setup_container.sh
root@f7a6ffa5d70d:/work# make manifests
...
/work
/go/bin/controller-gen "crd:maxDescLen=0,crdVersions=v1,generateEmbeddedObjectMeta=true" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases
# Per default controller-gen will generate a ClusterRole for our example we want to use a Role and the namespace marker doesn't
# work since it requires a namespace and kustomize doesn't support to change the Kind.
make: Warning: File 'config/crd/bases/apps.foundationdb.org_foundationdbclusters.yaml' has modification time 0.35 s in the future
make: warning: Clock skew detected. Your build may be incomplete.
root@f7a6ffa5d70d:/work# make test
go test ./... -coverprofile cover.out
? github.com/FoundationDB/fdb-kubernetes-operator [no test files]
ok github.com/FoundationDB/fdb-kubernetes-operator/api/v1beta1 0.388s coverage: 41.2% of statements
ok github.com/FoundationDB/fdb-kubernetes-operator/api/v1beta2 0.516s coverage: 46.8% of statements
? github.com/FoundationDB/fdb-kubernetes-operator/cmd/po-docgen [no test files]
...
Do we need to perform additional testing once this is merged, or perform in a larger testing environment? This changes should not have any effect so additional testing is not required I believe.
Documentation
Did you update relevant documentation within this repository? No
If this change is adding new functionality, do we need to describe it in our user manual? Once the three data hall functionality is fully implemented it should be documented indeed.
If this change is adding or removing subreconcilers, have we updated the core technical design doc to reflect that? N/A
If this change is adding new safety checks or new potential failure modes, have we documented and how to debug potential issues? N/A
Follow-up
Are there any follow-up issues that we should pursue in the future?
Does this introduce new defaults that we should re-evaluate in the future? No
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 10d2e4a058d8c103cf16e6027a55726199d96058
- Duration 4:05:51
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Is the plan to add the functionality in another PR?
I can continue working on this PR no problem. If that is your preference.
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 3eb6e7d5592feb589e5421c5e1981f6e2d914d16
- Duration 4:05:45
- Result: :white_check_mark: SUCCEEDED
- Error:
N/A - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Is the plan to add the functionality in another PR?
I can continue working on this PR no problem. If that is your preference.
Personally I would prefer to implement them in the same PR otherwise we have new fields on the CRD that have no effect.
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: fceb6a1cad6360951d9765535504046b9cbc2f90
- Duration 4:06:44
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 2f0b16219b8fa7083c5001490cce30c4bc864eb2
- Duration 2:47:39
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: fc6166e3c923d507187e73dd14e3d1454b9454cb
- Duration 4:07:56
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: eb7cc0917a9e536e874c3ee781c005d4fcf149e1
- Duration 4:07:44
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: aa82180ca52c8eed69c90794fb8d449de00f9442
- Duration 4:07:48
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 5349992c9f375e5ab30ff6f1616b4cd8adcbfac8
- Duration 4:07:54
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 02903323fd336b0dd6cdc579913b0fce38cd204f
- Duration 4:10:16
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: a109ac6ac613fa98f82d70e1d972c476e425cc80
- Duration 4:07:47
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 2e32144722452efb8991ed97ad79db15883ca2b4
- Duration 4:10:02
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 83e42c93a962a7c8ca50859223dc99bcd8485925
- Duration 4:08:01
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 6c809bc347847c35eaca2ea8dd194be2330fee9e
- Duration 4:08:07
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 6ad5effca2382a6a8c326771a57ed09723ca2264
- Duration 4:08:02
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 526a3ab7b691aaf9cf87f7843d3965ed41efe9ad
- Duration 4:07:54
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 5839e7e1d8f2af9cc5624f7595f5205092c1adc9
- Duration 4:07:40
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 226f664a779bc5e0c99259edaff5a3da09490c94
- Duration 4:07:57
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 492ef538a851ab3d298c64703b132dd64ce346b4
- Duration 4:07:57
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 2e42fe074880d0cb1899d25a1066022a5038658e
- Duration 4:08:09
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 60f00f7098e206150df86b2d1f215d65bd8deea5
- Duration 4:08:08
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: d10e1a678286e9f951800afd423977c7a823dac2
- Duration 4:08:03
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
It probably needs some work but I believe most of the initial implementation should be covered.
To keep track of the Process Group Locality I have updated the Cluster Status to add a new field ProcessGroupLocalityZoneID which will be updated by the status update sub-reconciler.
The idea is to set the Pod nodeSelector depending on the distribution of the ProcessGroups. Picking the zone with less PG to add new pods.
The zone is passed down to the Monitor configuration to set the locality_data_hall argument using the FDB_ZONE_ID variable.
When removing pods we can follow the same approach. In order to unbalance the cluster. We can pick the zone with more pods and remove a process from it. (This is not yet implemented but I added some comments where I believe the logic should go).
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 7be57e0dd200c9d286bdc7dd2949ca1f2af38f02
- Duration 4:07:55
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: d4263f172d4afe86e32163683b8f928220fe3add
- Duration 4:07:56
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
I'm not sure about the changes/used of the ZONE_ID, I would think for data hall we want to have a dedicated env variable and not reuse the zone id (since the zone ID is still used in the three data hall mode).
Could you add some more tests round the pod locality changes?
Agreed. My bad I mixed up things a bit here. Thank you for the clarification.
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 12b9cb9d48fac8394bee43d62d9e43e9df828aed
- Duration 4:09:54
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 0a082f4ec8db6cb482ef5ec0c4cf281630513601
- Duration 4:10:00
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)
Result of fdb-kubernetes-operator-pr on Linux CentOS 7
- Commit ID: 169f334519aaec18194655cec87598fcad74fc53
- Duration 4:09:42
- Result: :x: FAILED
- Error:
Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1 - Build Logs (available for 30 days)
- Build Artifact (available for 30 days)