HBASE-27574 Implement ClusterManager interface for Kubernetes
A basic implementation that supports taking destructive actions. Assume that services are running behind a resilient Deployment of some kind, and that the cluster will handle starting up replacement processes. Requires specification of a scoping namespace.
Transitive hull of the new dependencies. Should we be shading these?
[INFO] +- io.kubernetes:client-java:jar:17.0.0:test
[INFO] | +- io.prometheus:simpleclient:jar:0.15.0:test
[INFO] | | +- io.prometheus:simpleclient_tracer_otel:jar:0.15.0:test
[INFO] | | | \- io.prometheus:simpleclient_tracer_common:jar:0.15.0:test
[INFO] | | \- io.prometheus:simpleclient_tracer_otel_agent:jar:0.15.0:test
[INFO] | +- io.prometheus:simpleclient_httpserver:jar:0.15.0:test
[INFO] | | \- io.prometheus:simpleclient_common:jar:0.15.0:test
[INFO] | +- io.kubernetes:client-java-proto:jar:17.0.0:test
[INFO] | +- org.yaml:snakeyaml:jar:1.33:test
[INFO] | +- org.apache.commons:commons-compress:jar:1.22:compile
[INFO] | +- org.bouncycastle:bcpkix-jdk18on:jar:1.72:test
[INFO] | | +- org.bouncycastle:bcprov-jdk18on:jar:1.72:test
[INFO] | | \- org.bouncycastle:bcutil-jdk18on:jar:1.72:test
[INFO] | +- com.google.protobuf:protobuf-java:jar:3.21.10:compile
[INFO] | +- org.apache.commons:commons-collections4:jar:4.4:test
[INFO] | \- org.bitbucket.b_c:jose4j:jar:0.9.2:test
[INFO] +- io.kubernetes:client-java-api:jar:17.0.0:test
[INFO] | +- io.swagger:swagger-annotations:jar:1.6.9:test
[INFO] | +- com.squareup.okhttp3:okhttp:jar:4.10.0:test
[INFO] | | +- com.squareup.okio:okio-jvm:jar:3.0.0:test
[INFO] | | | \- org.jetbrains.kotlin:kotlin-stdlib-common:jar:1.5.31:test
[INFO] | | \- org.jetbrains.kotlin:kotlin-stdlib:jar:1.6.20:test
[INFO] | | \- org.jetbrains:annotations:jar:13.0:test
[INFO] | +- com.squareup.okhttp3:logging-interceptor:jar:4.10.0:test
[INFO] | | \- org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.6.10:test
[INFO] | | \- org.jetbrains.kotlin:kotlin-stdlib-jdk7:jar:1.6.10:test
[INFO] | +- com.google.code.gson:gson:jar:2.10:compile
[INFO] | \- io.gsonfire:gson-fire:jar:1.8.5:test
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Comment |
|---|---|---|---|
| +0 :ok: | reexec | 0m 33s | Docker mode activated. |
| -0 :warning: | yetus | 0m 2s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck |
| _ Prechecks _ | |||
| _ master Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 2m 27s | master passed |
| +1 :green_heart: | compile | 0m 15s | master passed |
| +1 :green_heart: | shadedjars | 4m 3s | branch has no errors when building our shaded downstream artifacts. |
| +1 :green_heart: | javadoc | 0m 11s | master passed |
| _ Patch Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 2m 6s | the patch passed |
| +1 :green_heart: | compile | 0m 16s | the patch passed |
| +1 :green_heart: | javac | 0m 16s | the patch passed |
| +1 :green_heart: | shadedjars | 4m 2s | patch has no errors when building our shaded downstream artifacts. |
| +1 :green_heart: | javadoc | 0m 9s | the patch passed |
| _ Other Tests _ | |||
| +1 :green_heart: | unit | 0m 36s | hbase-it in the patch passed. |
| 15m 48s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/4979 |
| Optional Tests | javac javadoc unit shadedjars compile |
| uname | Linux b92ac772cc91 5.4.0-1093-aws #102~18.04.2-Ubuntu SMP Wed Dec 7 00:31:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / da261344cc |
| Default Java | Temurin-1.8.0_352-b08 |
| Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/1/testReport/ |
| Max. process+thread count | 554 (vs. ulimit of 30000) |
| modules | C: hbase-it U: hbase-it |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/1/console |
| versions | git=2.34.1 maven=3.8.6 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Comment |
|---|---|---|---|
| +0 :ok: | reexec | 1m 4s | Docker mode activated. |
| -0 :warning: | yetus | 0m 4s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck |
| _ Prechecks _ | |||
| _ master Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 2m 48s | master passed |
| +1 :green_heart: | compile | 0m 19s | master passed |
| +1 :green_heart: | shadedjars | 3m 53s | branch has no errors when building our shaded downstream artifacts. |
| +1 :green_heart: | javadoc | 0m 14s | master passed |
| _ Patch Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 2m 33s | the patch passed |
| +1 :green_heart: | compile | 0m 20s | the patch passed |
| +1 :green_heart: | javac | 0m 20s | the patch passed |
| +1 :green_heart: | shadedjars | 3m 49s | patch has no errors when building our shaded downstream artifacts. |
| +1 :green_heart: | javadoc | 0m 13s | the patch passed |
| _ Other Tests _ | |||
| +1 :green_heart: | unit | 0m 42s | hbase-it in the patch passed. |
| 16m 50s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/4979 |
| Optional Tests | javac javadoc unit shadedjars compile |
| uname | Linux 1eab1961a285 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / da261344cc |
| Default Java | Eclipse Adoptium-11.0.17+8 |
| Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/1/testReport/ |
| Max. process+thread count | 591 (vs. ulimit of 30000) |
| modules | C: hbase-it U: hbase-it |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/1/console |
| versions | git=2.34.1 maven=3.8.6 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Comment |
|---|---|---|---|
| +0 :ok: | reexec | 0m 35s | Docker mode activated. |
| _ Prechecks _ | |||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. |
| +1 :green_heart: | hbaseanti | 0m 0s | Patch does not have any anti-patterns. |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. |
| _ master Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 2m 43s | master passed |
| +1 :green_heart: | compile | 0m 24s | master passed |
| +1 :green_heart: | checkstyle | 0m 10s | master passed |
| +1 :green_heart: | spotless | 0m 41s | branch has no errors when running spotless:check. |
| +1 :green_heart: | spotbugs | 0m 23s | master passed |
| _ Patch Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 2m 23s | the patch passed |
| +1 :green_heart: | compile | 0m 24s | the patch passed |
| +1 :green_heart: | javac | 0m 24s | the patch passed |
| +1 :green_heart: | checkstyle | 0m 8s | the patch passed |
| +1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. |
| +1 :green_heart: | xml | 0m 1s | The patch has no ill-formed XML file. |
| +1 :green_heart: | hadoopcheck | 9m 4s | Patch does not cause any errors with Hadoop 3.2.4 3.3.4. |
| +1 :green_heart: | spotless | 0m 37s | patch has no errors when running spotless:check. |
| +1 :green_heart: | spotbugs | 0m 28s | the patch passed |
| _ Other Tests _ | |||
| +1 :green_heart: | asflicense | 0m 9s | The patch does not generate ASF License warnings. |
| 23m 50s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/1/artifact/yetus-general-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/4979 |
| Optional Tests | dupname asflicense javac hadoopcheck spotless xml compile spotbugs hbaseanti checkstyle |
| uname | Linux a1ff09085b00 5.4.0-1093-aws #102~18.04.2-Ubuntu SMP Wed Dec 7 00:31:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / da261344cc |
| Default Java | Eclipse Adoptium-11.0.17+8 |
| Max. process+thread count | 80 (vs. ulimit of 30000) |
| modules | C: hbase-it U: hbase-it |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/1/console |
| versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
For better work together with K8s, I think we need to discuss more about how to define the actions...
A simple kill of a region server is not enough to reduce the number of region servers, as the deployment will soon start a new one...
Simply setting the number of pods is not applicable here as we can not control which region server it will kill...
I agree that there is a longer arch of a discussion re: HBase and a container runtime platform like Kubernetes. I believe that Kubernetes implements its own form of chaos, and I have not yet explored an implementation based on that tooling.
However. Just like the CM-based and coprocessor-based implementations before it, this has allowed me to use our existing ITBLL + Chaos tools in an environment that is convenient to what I have available to me in my organization. It's convenient to be able to run the same processes in the new deployment environment and have everything basically function. I'd like to share it with the community, especially if there's a path to us using similar tools as part of our project's resource budget.
I agree that there is a longer arch of a discussion re: HBase and a container runtime platform like Kubernetes. I believe that Kubernetes implements its own form of chaos, and I have not yet explored an implementation based on that tooling.
However. Just like the CM-based and coprocessor-based implementations before it, this has allowed me to use our existing ITBLL + Chaos tools in an environment that is convenient to what I have available to me in my organization. It's convenient to be able to run the same processes in the new deployment environment and have everything basically function. I'd like to share it with the community, especially if there's a path to us using similar tools as part of our project's resource budget.
Then let's not use the generate "KubernetesClusterManager" as the name? Maybe later we will have other type of K8s cluster manager...
Rebased and PR feedback. These changes are untested.
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Comment |
|---|---|---|---|
| +0 :ok: | reexec | 0m 25s | Docker mode activated. |
| -0 :warning: | yetus | 0m 4s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck |
| _ Prechecks _ | |||
| _ master Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 2m 55s | master passed |
| +1 :green_heart: | compile | 0m 17s | master passed |
| +1 :green_heart: | shadedjars | 4m 15s | branch has no errors when building our shaded downstream artifacts. |
| +1 :green_heart: | javadoc | 0m 13s | master passed |
| -0 :warning: | patch | 4m 37s | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. |
| _ Patch Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 2m 45s | the patch passed |
| +1 :green_heart: | compile | 0m 17s | the patch passed |
| +1 :green_heart: | javac | 0m 17s | the patch passed |
| +1 :green_heart: | shadedjars | 4m 13s | patch has no errors when building our shaded downstream artifacts. |
| +1 :green_heart: | javadoc | 0m 11s | the patch passed |
| _ Other Tests _ | |||
| +1 :green_heart: | unit | 0m 38s | hbase-it in the patch passed. |
| 16m 59s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/4979 |
| Optional Tests | javac javadoc unit shadedjars compile |
| uname | Linux b8f6afff9ae5 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / 913cf6b96d |
| Default Java | Temurin-1.8.0_352-b08 |
| Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/2/testReport/ |
| Max. process+thread count | 557 (vs. ulimit of 30000) |
| modules | C: hbase-it U: hbase-it |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/2/console |
| versions | git=2.34.1 maven=3.8.6 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Comment |
|---|---|---|---|
| +0 :ok: | reexec | 0m 47s | Docker mode activated. |
| -0 :warning: | yetus | 0m 3s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck |
| _ Prechecks _ | |||
| _ master Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 3m 47s | master passed |
| +1 :green_heart: | compile | 0m 19s | master passed |
| +1 :green_heart: | shadedjars | 4m 49s | branch has no errors when building our shaded downstream artifacts. |
| +1 :green_heart: | javadoc | 0m 13s | master passed |
| -0 :warning: | patch | 5m 8s | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. |
| _ Patch Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 3m 27s | the patch passed |
| +1 :green_heart: | compile | 0m 17s | the patch passed |
| +1 :green_heart: | javac | 0m 17s | the patch passed |
| +1 :green_heart: | shadedjars | 5m 47s | patch has no errors when building our shaded downstream artifacts. |
| +1 :green_heart: | javadoc | 0m 17s | the patch passed |
| _ Other Tests _ | |||
| +1 :green_heart: | unit | 0m 51s | hbase-it in the patch passed. |
| 21m 21s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/4979 |
| Optional Tests | javac javadoc unit shadedjars compile |
| uname | Linux b00f921f88c8 5.4.0-1088-aws #96~18.04.1-Ubuntu SMP Mon Oct 17 02:57:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / 913cf6b96d |
| Default Java | Eclipse Adoptium-11.0.17+8 |
| Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/2/testReport/ |
| Max. process+thread count | 580 (vs. ulimit of 30000) |
| modules | C: hbase-it U: hbase-it |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/2/console |
| versions | git=2.34.1 maven=3.8.6 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Comment |
|---|---|---|---|
| +0 :ok: | reexec | 0m 35s | Docker mode activated. |
| _ Prechecks _ | |||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. |
| +1 :green_heart: | hbaseanti | 0m 0s | Patch does not have any anti-patterns. |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. |
| _ master Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 3m 32s | master passed |
| +1 :green_heart: | compile | 0m 24s | master passed |
| +1 :green_heart: | checkstyle | 0m 10s | master passed |
| +1 :green_heart: | spotless | 0m 41s | branch has no errors when running spotless:check. |
| +1 :green_heart: | spotbugs | 0m 23s | master passed |
| -0 :warning: | patch | 0m 29s | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. |
| _ Patch Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 3m 12s | the patch passed |
| +1 :green_heart: | compile | 0m 23s | the patch passed |
| +1 :green_heart: | javac | 0m 23s | the patch passed |
| +1 :green_heart: | checkstyle | 0m 9s | the patch passed |
| +1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. |
| +1 :green_heart: | xml | 0m 1s | The patch has no ill-formed XML file. |
| +1 :green_heart: | hadoopcheck | 12m 47s | Patch does not cause any errors with Hadoop 3.2.4 3.3.4. |
| +1 :green_heart: | spotless | 0m 38s | patch has no errors when running spotless:check. |
| +1 :green_heart: | spotbugs | 0m 26s | the patch passed |
| _ Other Tests _ | |||
| +1 :green_heart: | asflicense | 0m 8s | The patch does not generate ASF License warnings. |
| 30m 52s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/2/artifact/yetus-general-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/4979 |
| Optional Tests | dupname asflicense javac hadoopcheck spotless xml compile spotbugs hbaseanti checkstyle |
| uname | Linux f7b7761c43f4 5.4.0-1093-aws #102~18.04.2-Ubuntu SMP Wed Dec 7 00:31:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / 913cf6b96d |
| Default Java | Eclipse Adoptium-11.0.17+8 |
| Max. process+thread count | 86 (vs. ulimit of 30000) |
| modules | C: hbase-it U: hbase-it |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4979/2/console |
| versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
That's a good point @bbeaudreault . I wonder if the current ssh-based ClusterManager can be coerced into running kubectl exec instead of ssh.
Checked the API description
https://github.com/kubernetes-client/java/blob/master/kubernetes/docs/CoreV1Api.md#deleteNamespacedPod
The deleteNamespacedPod method has a gracePeriodSeconds, 0 means delete immediately, so I think it could archive what we want.
But what I concern more is about how to correctly support stop and kill, as in K8s, if you do not shift the replica count, the framework will launch a new pod right after you delete a pod...
I think this is exactly what we want, but seems still not fully implemented yet... https://github.com/kubernetes/kubernetes/issues/45509
And we also need to change some semantics for the cluster manager. For example, on K8s, it is useless to specify a hostname when starting a new region server, so maybe we could change the API to "startNewRegionServer", as even for non k8s environment, I do not think we must start a region server on a given host, we just need to start a new one, right?
And for stop, kill, restart, maybe we could also change the semantice so it would fit both k8s and non k8s environment. For example, we just remove stop and kill, only leave restart there, but we provide a flag to indicate how to stop the region server, i.e, a graceful shutdown, or a force kill. And we provide another api called reduceRegionServerNumber. For K8s environment, it is just a API call, and for non k8s environment, we can randomly select a region server to stop. This is not perfect but I think it could fit most of our test scenarios.
What do you guys think?
Thanks.
And I'm a bit interest on how do you guys manage datanode or namenode on K8s? They have local storage, so if you delete the pod and launch a new one at other places, the data will be lost...
Use stateful set?
Thanks.
Personally I prefer to use Exec API for this. It seems somewhat artificial to try reducing the pod count just for the sake of it.
IMO chaos monkey is for testing both hbase handling and deployment automation. Outside k8s, if you stop a regionserver process you better have monit or sysctl to start it back up. In kubernetes, this is handled for you.
So if chaos sends a kill 9, it's doing a good job of testing hire both systems handle a regionserver dying. Maybe in kubernetes you have an init container which gets in the way of the pod gracefully having a regionserver container dying. Chaos would expose that.
Otherwise I think kill -stop is an important feature and I wouldn't want to bury it in an option. So that's another reason just replacing ssh with Exec api would be nice.
And I'm a bit interest on how do you guys manage datanode or namenode on K8s? They have local storage, so if you delete the pod and launch a new one at other places, the data will be lost
We currently don't run DataNodes in k8s. For namenodes we use StatefulSet with EBS backed.
For DataNodes we don't want to use EBS, too expensive. When we eventually get to then we plan to use FlexVolumes to basically provision space on particular SSD-backed kube nodes. So if pod restarts, it would go to the same node if it's available. If not, it would go elsewhere and lose its data but this is how things work outside k8s and is handled by hdfs replication. Sadly can't give more details than this right now because it's been on hold for a while so we can work on other things.