🆕 Title:

Enhancement Proposal: Add Support for HA Deployment Profiles for Big Data Components in Kubernetes

###Is your feature request related to a problem? Please describe.

CloudEon provides Kubernetes-based deployment and lifecycle management for open-source big data platforms. However, current deployment templates often assume default or single-instance setups, which may not meet the reliability needs of production clusters.

For users deploying components like HDFS, Hive, Spark, and Kafka at scale — especially in telecom and enterprise contexts — there is a strong need for pre-validated High Availability (HA) profiles.

✅ Describe the solution you'd like

I propose adding built-in HA deployment profiles for key components such as:

HDFS NameNodes (Active-Standby)
Kafka clusters with replication and Zookeeper ensemble
Hive Metastore with failover
Spark Standalone with redundant masters

These profiles would be:

Declarative (via values.yaml or CRD templates)
Toggled via a simple profile: ha or profile: dev parameter
Include optional Prometheus/Grafana hooks for monitoring readiness

This helps users quickly spin up resilient production-grade clusters without manually editing Helm charts or manifests.

🔄 Describe alternatives you’ve considered

Manually customizing Helm charts per component, but this adds error risk and breaks consistency. Providing curated HA templates allows for standardized and tested deployments across the user base.

📌 Additional context

I work as a Senior Systems Architect focused on cloud-native automation and infrastructure design for real-time telecom workloads. While I don’t contribute code directly, I regularly support platform teams in:

Defining deployment standards
Hardening big data stacks for high traffic
Aligning cluster setup with SLAs and CI/CD workflows

I’d be happy to assist in designing the profiles, testing configuration logic, and reviewing draft templates.

🚀 Benefits to the CloudEon Project

Helps users scale from POC to production seamlessly
Increases CloudEon's appeal in enterprise/telco environments
Reduces user friction and errors during configuration
Promotes best practices in resource resilience and failover handling

Apr 17 '25 21:04 jigarpatel1007

你好，我是谢晋峰，你的邮件我已经收到，谢谢！

Apr 17 '25 21:04 xlostpath

📦 Sample HA Profile

values-ha.yaml

global: profile: ha monitoring: enabled: true prometheusOperator: true

hdfs: nameNode: replicas: 2 mode: ha haEnabled: true journalNode: enabled: true replicas: 3 dataNode: replicas: 3 config: dfs.replication: 3

zookeeper: enabled: true replicaCount: 3

kafka: replicas: 3 persistence: enabled: true size: 50Gi externalAccess: enabled: true config: offsets.topic.replication.factor: 3 transaction.state.log.replication.factor: 3 default.replication.factor: 3

hive: metastore: replicas: 2 haEnabled: true readinessProbe: enabled: true config: hive.metastore.event.db.notification.listener: org.apache.hive.hcatalog.listener.DbNotificationListener

spark: master: replicas: 2 haMode: true worker: replicas: 3 persistence: enabled: true size: 20Gi

ingress: enabled: true annotations: nginx.ingress.kubernetes.io/rewrite-target: /

Apr 17 '25 21:04 jigarpatel1007