fdb-kubernetes-operator icon indicating copy to clipboard operation
fdb-kubernetes-operator copied to clipboard

Add Localities to FoundationDBClusterSpec

Open manfontan opened this issue 3 years ago • 4 comments

Description

This PR implements the first step towards supporting three data hall configuration in the operator.

As discussed in: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/348

Type of change

Please select one of the options below.

New feature (non-breaking change which adds functionality)

Discussion

The changes in this PR are based on https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/design/three_datahall.md

Testing

Please describe the tests that you ran to verify your changes. Unit tests? Manual testing?

docker run --rm --entrypoint=/bin/bash -ti --platform="linux/amd64" -v $(pwd):/work -w /work docker.io/library/golang:1.18.5 ./scripts/setup_container.sh
root@f7a6ffa5d70d:/work# make manifests
...
/work
/go/bin/controller-gen "crd:maxDescLen=0,crdVersions=v1,generateEmbeddedObjectMeta=true" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases
# Per default controller-gen will generate a ClusterRole for our example we want to use a Role and the namespace marker doesn't
# work since it requires a namespace and kustomize doesn't support to change the Kind.
make: Warning: File 'config/crd/bases/apps.foundationdb.org_foundationdbclusters.yaml' has modification time 0.35 s in the future
make: warning:  Clock skew detected.  Your build may be incomplete.
root@f7a6ffa5d70d:/work# make test
go test  ./... -coverprofile cover.out
?       github.com/FoundationDB/fdb-kubernetes-operator [no test files]
ok      github.com/FoundationDB/fdb-kubernetes-operator/api/v1beta1     0.388s  coverage: 41.2% of statements
ok      github.com/FoundationDB/fdb-kubernetes-operator/api/v1beta2     0.516s  coverage: 46.8% of statements
?       github.com/FoundationDB/fdb-kubernetes-operator/cmd/po-docgen   [no test files]
...

Do we need to perform additional testing once this is merged, or perform in a larger testing environment? This changes should not have any effect so additional testing is not required I believe.

Documentation

Did you update relevant documentation within this repository? No

If this change is adding new functionality, do we need to describe it in our user manual? Once the three data hall functionality is fully implemented it should be documented indeed.

If this change is adding or removing subreconcilers, have we updated the core technical design doc to reflect that? N/A

If this change is adding new safety checks or new potential failure modes, have we documented and how to debug potential issues? N/A

Follow-up

Are there any follow-up issues that we should pursue in the future?

Does this introduce new defaults that we should re-evaluate in the future? No

manfontan avatar Sep 15 '22 16:09 manfontan

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 10d2e4a058d8c103cf16e6027a55726199d96058
  • Duration 4:05:51
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Sep 15 '22 21:09 foundationdb-ci

Is the plan to add the functionality in another PR?

I can continue working on this PR no problem. If that is your preference.

manfontan avatar Sep 16 '22 10:09 manfontan

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 3eb6e7d5592feb589e5421c5e1981f6e2d914d16
  • Duration 4:05:45
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Sep 16 '22 15:09 foundationdb-ci

Is the plan to add the functionality in another PR?

I can continue working on this PR no problem. If that is your preference.

Personally I would prefer to implement them in the same PR otherwise we have new fields on the CRD that have no effect.

johscheuer avatar Sep 21 '22 09:09 johscheuer

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: fceb6a1cad6360951d9765535504046b9cbc2f90
  • Duration 4:06:44
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 16 '22 19:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 2f0b16219b8fa7083c5001490cce30c4bc864eb2
  • Duration 2:47:39
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 17 '22 18:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: fc6166e3c923d507187e73dd14e3d1454b9454cb
  • Duration 4:07:56
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 22 '22 13:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: eb7cc0917a9e536e874c3ee781c005d4fcf149e1
  • Duration 4:07:44
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 22 '22 14:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: aa82180ca52c8eed69c90794fb8d449de00f9442
  • Duration 4:07:48
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 22 '22 21:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 5349992c9f375e5ab30ff6f1616b4cd8adcbfac8
  • Duration 4:07:54
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 22 '22 23:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 02903323fd336b0dd6cdc579913b0fce38cd204f
  • Duration 4:10:16
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 22 '22 23:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: a109ac6ac613fa98f82d70e1d972c476e425cc80
  • Duration 4:07:47
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 22 '22 23:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 2e32144722452efb8991ed97ad79db15883ca2b4
  • Duration 4:10:02
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 22 '22 23:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 83e42c93a962a7c8ca50859223dc99bcd8485925
  • Duration 4:08:01
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 22 '22 23:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 6c809bc347847c35eaca2ea8dd194be2330fee9e
  • Duration 4:08:07
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 22 '22 23:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 6ad5effca2382a6a8c326771a57ed09723ca2264
  • Duration 4:08:02
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 23 '22 01:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 526a3ab7b691aaf9cf87f7843d3965ed41efe9ad
  • Duration 4:07:54
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 23 '22 01:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 5839e7e1d8f2af9cc5624f7595f5205092c1adc9
  • Duration 4:07:40
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 23 '22 01:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 226f664a779bc5e0c99259edaff5a3da09490c94
  • Duration 4:07:57
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 23 '22 02:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 492ef538a851ab3d298c64703b132dd64ce346b4
  • Duration 4:07:57
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 23 '22 02:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 2e42fe074880d0cb1899d25a1066022a5038658e
  • Duration 4:08:09
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 23 '22 04:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 60f00f7098e206150df86b2d1f215d65bd8deea5
  • Duration 4:08:08
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 23 '22 13:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: d10e1a678286e9f951800afd423977c7a823dac2
  • Duration 4:08:03
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 23 '22 14:11 foundationdb-ci

It probably needs some work but I believe most of the initial implementation should be covered.

To keep track of the Process Group Locality I have updated the Cluster Status to add a new field ProcessGroupLocalityZoneID which will be updated by the status update sub-reconciler.

The idea is to set the Pod nodeSelector depending on the distribution of the ProcessGroups. Picking the zone with less PG to add new pods.

The zone is passed down to the Monitor configuration to set the locality_data_hall argument using the FDB_ZONE_ID variable.

When removing pods we can follow the same approach. In order to unbalance the cluster. We can pick the zone with more pods and remove a process from it. (This is not yet implemented but I added some comments where I believe the logic should go).

manfontan avatar Nov 24 '22 15:11 manfontan

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 7be57e0dd200c9d286bdc7dd2949ca1f2af38f02
  • Duration 4:07:55
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 24 '22 19:11 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: d4263f172d4afe86e32163683b8f928220fe3add
  • Duration 4:07:56
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Nov 24 '22 19:11 foundationdb-ci

I'm not sure about the changes/used of the ZONE_ID, I would think for data hall we want to have a dedicated env variable and not reuse the zone id (since the zone ID is still used in the three data hall mode).

Could you add some more tests round the pod locality changes?

Agreed. My bad I mixed up things a bit here. Thank you for the clarification.

manfontan avatar Dec 07 '22 17:12 manfontan

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 12b9cb9d48fac8394bee43d62d9e43e9df828aed
  • Duration 4:09:54
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Dec 07 '22 17:12 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 0a082f4ec8db6cb482ef5ec0c4cf281630513601
  • Duration 4:10:00
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Dec 07 '22 22:12 foundationdb-ci

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 169f334519aaec18194655cec87598fcad74fc53
  • Duration 4:09:42
  • Result: :x: FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

foundationdb-ci avatar Dec 07 '22 22:12 foundationdb-ci