platform icon indicating copy to clipboard operation
platform copied to clipboard

Kubernetes Operator Support in Control Center

Open jojule opened this issue 9 months ago • 0 comments

Description

Introduces a Kubernetes Operator in Vaadin Control Center to enable declarative app management via CRDs, improving resilience, automation, and UI decoupling while unifying status and reducing configuration drift. This architecture change makes it easier to fully automate Continuous Deployment workflows.

Internal issue

Tier

Free

License

Proprietary

Motivation

Background

Vaadin Control Center currently manages Vaadin applications in Kubernetes by driving resource creation and updates directly from its UI layer. Each application deployment via Control Center triggers the ApplicationService (in the control-center-app module) to call the Kubernetes API and provision resources such as Deployments, ConfigMaps, Services, Ingresses, Certificates, RBAC bindings, etc. While functional, this model ties management to the UI, makes status aggregation difficult, and is susceptible to drift when external changes occur. Kubernetes Operators define custom resources and reconcile loops to manage complex applications more reliably. By introducing an App Custom Resource Definition (CRD) and a dedicated Control Center Operator, we can offload lifecycle management to a controller that:

  • Watches App resources and ensures all related Kubernetes objects match the desired spec
  • Provides declarative CI/CD integration via YAML manifests or CLI
  • Maintains an up-to-date combined status for each application
  • Runs independently of the UI, improving resilience and scalability

Problem

  • UI-only Management: Apps can only be created or updated through the Control Center UI; no CLI/manifest-driven workflows.
  • Resource Drift: Changes made outside the UI (e.g., kubectl edit) may desynchronize cluster state from the intended spec.
  • Scattershot Status: The application’s health is inferred by aggregating statuses of individual resources, making it hard to display a unified view.
  • Availability & Load: The UI service itself handles orchestration; when it’s down or overloaded, app management is blocked.
  • Operator Visibility: Users cannot see operator installation or health in the UI.

Solution

Implement a Control Center Operator that:

  1. Defines an App CRD (apiVersion: vaadin.com/v1alpha1) representing a Vaadin application manifest.
  2. Reconciles every App resource to create/update:
  • Deployment (application pods)
  • Service (HTTP + internal Actuator)
  • Ingress (hostname routing)
  • ConfigMap (env, DB & Keycloak settings)
  • Certificate (via cert-manager)
  • RBAC components (Role, RoleBinding, ServiceAccount)
  1. Provisions supporting platforms when requested:
  • PostgreSQL cluster
  • Keycloak instance
  1. Computes and updates a combined .status field on the App CR with health, endpoints, and dns/cert status.
  2. Operates independently from the UI: changes can be applied via kubectl apply -f app.yaml, CI pipelines, or the existing UI.

Notes

  • The operator will live in a separate module (control-center-operator) alongside existing code.
  • The UI continues to be supported but will emit App CRs instead of invoking the Kubernetes API directly.
  • DNS and TLS management leverages ExternalDNS and cert-manager integrations already documented in Control Center.
  • Realm provisioning (Keycloak) and database setup will reuse patterns from the prototype code in ticket 802.
  • Versioning of the Operator refers to the Operator’s container image tag (e.g., vaadin/control-center-operator:v0.1.0), not the Vaadin version.
  • Versioning of the CRD should start at v1alpha1, with clearly documented upgrade paths to v1beta1 and beyond.

Requirements

CRD & Reconciliation

  • [ ] CRD Definition:
apiVersion: vaadin.com/v1alpha1  
kind: App  
metadata:    
  name: my-app  
spec:    
  host: my-app.example.com    
  image: my-image:1.0.0    
  replicas: 2      
  postgres:      
    database: my-db
  keycloak:
    realm: my-realm
    theme: lumo
  • [ ] Reconcile Loop: Must create/update all dependent resources based on spec.

Resource Management

Workloads & Networking:

  • [ ] Deployment, Service for app and actuator
  • [ ] Ingress configured with cert-manager annotation for Let’s Encrypt
  • [ ] ExternalDNS annotations on services/ingress

Configuration:

  • [ ] ConfigMap for env vars, DB credentials, Keycloak OIDC settings
  • [ ] KubernetesSecretinjection for credentials (e.g. DO token, DB password)

Identity & Auth:

  • [ ] Create or link existing realm
  • [ ] Configure client for the app

Persistence:

  • [ ] Expose connection details to the app via ConfigMap/Secret

Status Reporting

  • [ ] Populate .status.phase (e.g. Pending, Reconciling, Running, Error).
  • [ ] Report individual resource conditions (e.g. Deployment ready, TLS issued).
  • [ ] Expose endpoints (host URL, health URL) in status.

UI & CLI Integration

UI Changes:

  • [ ] Deploy Workflow: “Deploy” button emits an App CR instead of direct API calls
  • [ ] Operator Panel: In Settings/Application view, display:
  • Installation status (Installed / Missing)
  • Operator Deployment health (ready vs desired pods)
  • Operator version (container image tag)
  • [ ] Dashboard: Show Operator health indicator and provide link to logs/events
  • [ ] App Status: Read from App.status for unified health display

CLI / GitOps:

  • [ ] Support kubectl apply -f app.yaml to create/update apps
  • [ ] Support kubectl delete -f app.yaml to tear down apps
  • [ ] Document sample manifests in Control Center docs

Automated tests

  • [ ] End-to-end tests using KinD or Kindnet.

Documentation

  • [ ] Migration documentation from 1.x: Guiding how to do that manually.

Nice-to-haves

  • [ ] Replace Keycloak Operator
  • [ ] Compatibility with Native images
  • [ ] Automated migration of apps from Control Center 1.x

Risks, limitations and breaking changes

Risks

  • Stateful Migration: Converting existing UI-managed apps to CRs may leave orphaned resources.
  • The way how we use operator and how CRD is defined might not allow sufficiently large share of the target group to use Control Center

Limitations

  • Vaadin-only: Limited to Vaadin applications.
  • Single-namespace: Operator scoped to one namespace; multi-namespace deployments require extension.
  • Basic Rollouts: No built-in traffic shaping (beyond standard Kubernetes strategies).
  • Keycloak Scope: Only supports OIDC-based Keycloak realms; no SAML or other IdPs.

Breaking changes

  • Deprecation of ApplicationService: UI calls to ApplicationService for resource creation will be removed.
  • Manifest-first Workflow: Pipeline and UI workflows must transition from form-driven operations to manifest-driven deployments.
  • API Endpoints: Any direct REST calls to Control Center’s resource-provisioning endpoints will be retired.

Out of scope

  • Automated migration tooling for existing UI-managed apps (manual steps documented instead).
  • Cross-realm or cross-cluster identity federation.
  • Non-Kubernetes platforms (OpenShift-specific extensions can be considered later).
  • Integration with external secret stores (e.g. Vault)—secrets remain in Kubernetes Secret.

Materials

No response

Metrics

  • Operator Adoption: Percentage of new app deployments via CRD vs. legacy UI.
  • Reconciliation Success Rate: Ratio of successful reconcile loops to failures.
  • Time to Deploy: Average time from kubectl apply to app status Running.
  • Resource Drift Incidents: Count of manual edits detected vs. total apps.
  • User Satisfaction: Survey feedback on workflow improvements post-operator rollout.

Pre-implementation checklist

  • [ ] Estimated (estimate entered into Estimate custom field)
  • [ ] Product Manager sign-off
  • [ ] Engineering Manager sign-off

Pre-release checklist

  • [ ] Documented (link to documentation provided in sub-issue or comment)
  • [ ] UX/DX tests conducted and blockers addressed
  • [ ] Approved for release by Product Manager

Security review

Peer reviewed

jojule avatar Feb 24 '25 02:02 jojule