Add Support for HA Deployment Profiles for Big Data Components in Kubernetes
🆕 Title:
Enhancement Proposal: Add Support for HA Deployment Profiles for Big Data Components in Kubernetes
###Is your feature request related to a problem? Please describe.
CloudEon provides Kubernetes-based deployment and lifecycle management for open-source big data platforms. However, current deployment templates often assume default or single-instance setups, which may not meet the reliability needs of production clusters.
For users deploying components like HDFS, Hive, Spark, and Kafka at scale — especially in telecom and enterprise contexts — there is a strong need for pre-validated High Availability (HA) profiles.
✅ Describe the solution you'd like
I propose adding built-in HA deployment profiles for key components such as:
- HDFS NameNodes (Active-Standby)
- Kafka clusters with replication and Zookeeper ensemble
- Hive Metastore with failover
- Spark Standalone with redundant masters
These profiles would be:
- Declarative (via values.yaml or CRD templates)
- Toggled via a simple
profile: haorprofile: devparameter - Include optional Prometheus/Grafana hooks for monitoring readiness
This helps users quickly spin up resilient production-grade clusters without manually editing Helm charts or manifests.
🔄 Describe alternatives you’ve considered
Manually customizing Helm charts per component, but this adds error risk and breaks consistency. Providing curated HA templates allows for standardized and tested deployments across the user base.
📌 Additional context
I work as a Senior Systems Architect focused on cloud-native automation and infrastructure design for real-time telecom workloads. While I don’t contribute code directly, I regularly support platform teams in:
- Defining deployment standards
- Hardening big data stacks for high traffic
- Aligning cluster setup with SLAs and CI/CD workflows
I’d be happy to assist in designing the profiles, testing configuration logic, and reviewing draft templates.
🚀 Benefits to the CloudEon Project
- Helps users scale from POC to production seamlessly
- Increases CloudEon's appeal in enterprise/telco environments
- Reduces user friction and errors during configuration
- Promotes best practices in resource resilience and failover handling
你好,我是谢晋峰,你的邮件我已经收到,谢谢!
📦 Sample HA Profile
values-ha.yaml
global: profile: ha monitoring: enabled: true prometheusOperator: true
hdfs: nameNode: replicas: 2 mode: ha haEnabled: true journalNode: enabled: true replicas: 3 dataNode: replicas: 3 config: dfs.replication: 3
zookeeper: enabled: true replicaCount: 3
kafka: replicas: 3 persistence: enabled: true size: 50Gi externalAccess: enabled: true config: offsets.topic.replication.factor: 3 transaction.state.log.replication.factor: 3 default.replication.factor: 3
hive: metastore: replicas: 2 haEnabled: true readinessProbe: enabled: true config: hive.metastore.event.db.notification.listener: org.apache.hive.hcatalog.listener.DbNotificationListener
spark: master: replicas: 2 haMode: true worker: replicas: 3 persistence: enabled: true size: 20Gi
ingress: enabled: true annotations: nginx.ingress.kubernetes.io/rewrite-target: /