katib icon indicating copy to clipboard operation
katib copied to clipboard

Helm Chart Templates For Katib

Open kunal-511 opened this issue 6 months ago • 21 comments

What this PR does / why we need it: Helm Templates For Katib

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #

Checklist:

  • [ ] Docs included if any changes are user facing

kunal-511 avatar Jun 17 '25 13:06 kunal-511

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Jun 17 '25 13:06 google-oss-prow[bot]

cc: @juliusvonkohout

kunal-511 avatar Jun 17 '25 13:06 kunal-511

@kunal-511: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Jun 22 '25 12:06 google-oss-prow[bot]

/ok-to-test

juliusvonkohout avatar Jun 23 '25 14:06 juliusvonkohout

/retest

kunal-511 avatar Jun 24 '25 04:06 kunal-511

@juliusvonkohout should this PR be linked to https://github.com/kubeflow/community/pull/832 ?

cc @kunal-511

varodrig avatar Jul 01 '25 01:07 varodrig

@kunal-511 let us know once this PR is ready for review. It seems there's a hold + two checks failing.

Additionally, can you add some description and comments on the PR, we have 58 files to review and it'll be great to have more comments on it to help us with the approval.

Another thing I think it will be great if we can involve the Katib working group on this review. thoughts @juliusvonkohout cc @andreyvelich

varodrig avatar Jul 01 '25 01:07 varodrig

/ok-to-test

juliusvonkohout avatar Jul 01 '25 12:07 juliusvonkohout

The tree structure of the Kubeflow Katib Helm chart directory with explanations for each file and folder.

katib(helm charts)/
├── Chart.yaml                                 # Helm chart metadata and version information
├── README.md                                  # Chart documentation and usage instructions
├── values.yaml                                # Default configuration values for the chart
├── ci/                                        # CI/CD configuration files for testing
│   ├── values-cert-manager.yaml               # Test values for cert-manager integration
│   ├── values-enterprise.yaml                 # Test values for enterprise deployment
│   ├── values-external-db.yaml                # Test values for external database configuration
│   ├── values-kubeflow.yaml                   # Test values for Kubeflow integration
│   ├── values-leader-election.yaml            # Test values for leader election setup
│   ├── values-openshift.yaml                  # Test values for OpenShift platform
│   ├── values-postgres.yaml                   # Test values for PostgreSQL database
│   ├── values-production.yaml                 # Test values for production deployment
│   └── values-standalone.yaml                 # Test values for standalone deployment
├── crds/                                      # Custom Resource Definitions for Katib
│   ├── experiment.yaml                        # CRD for hyperparameter tuning experiments
│   ├── suggestion.yaml                        # CRD for algorithm suggestions
│   └── trial.yaml                             # CRD for individual experiment trials
├── templates/                                 # Helm template files for Kubernetes resources
│   ├── _helpers.tpl                           # Template helper functions and common definitions
│   ├── autoscaling/                           # Horizontal Pod Autoscaler configurations
│   │   └── hpa.yaml                           # HPA template for scaling components
│   ├── config/                                # Configuration files and ConfigMaps
│   │   └── configmap.yaml                     # Main configuration settings for Katib
│   ├── controller/                            # Katib controller deployment resources
│   │   ├── deployment.yaml                    # Controller deployment specification
│   │   ├── leader-election-rbac.yaml          # RBAC for leader election functionality
│   │   ├── rbac.yaml                          # Role-based access control for controller
│   │   ├── service.yaml                       # Service for controller communication
│   │   ├── serviceaccount.yaml                # ServiceAccount for controller pod
│   │   └── trial-templates-configmap.yaml     # Trial template configurations
│   ├── database/                              # Database deployment resources
│   │   ├── mysql-deployment.yaml              # MySQL database deployment
│   │   ├── mysql-pvc.yaml                     # MySQL persistent volume claim
│   │   ├── mysql-service.yaml                 # MySQL service for database access
│   │   ├── postgres-deployment.yaml           # PostgreSQL database deployment
│   │   ├── postgres-pvc.yaml                  # PostgreSQL persistent volume claim
│   │   ├── postgres-service.yaml              # PostgreSQL service for database access
│   │   └── secret.yaml                        # Database credentials and secrets
│   ├── db-manager/                            # Database manager component
│   │   ├── deployment.yaml                    # DB manager deployment specification
│   │   └── service.yaml                       # Service for DB manager communication
│   ├── db/                                    # Additional database configurations
│   ├── istio/                                 # Istio service mesh integration
│   │   ├── authorization-policy.yaml          # Istio authorization policies
│   │   └── virtual-service.yaml               # Istio virtual service configuration
│   ├── monitoring/                            # Monitoring and observability resources
│   │   └── servicemonitor.yaml                # Prometheus ServiceMonitor for metrics
│   ├── namespace/                             # Namespace creation resources
│   │   └── namespace.yaml                     # Namespace definition for Katib
│   ├── openshift/                             # OpenShift-specific configurations
│   ├── rbac/                                  # Additional RBAC configurations
│   │   └── kubeflow-roles.yaml                # Kubeflow-specific role definitions
│   ├── security/                              # Security-related configurations
│   │   ├── network-policy.yaml                # Network policies for pod communication
│   │   └── pod-disruption-budget.yaml         # Pod disruption budget for availability
│   ├── ui/                                    # Katib UI component resources
│   │   ├── deployment.yaml                    # UI deployment specification
│   │   ├── rbac.yaml                          # RBAC for UI component
│   │   ├── service.yaml                       # Service for UI access
│   │   └── serviceaccount.yaml                # ServiceAccount for UI pod
│   └── webhook/                               # Admission webhook configurations
│       ├── certificate.yaml                   # TLS certificates for webhook
│       ├── mutating-webhook.yaml              # Mutating admission webhook
│       └── validating-webhook.yaml            # Validating admission webhook

@juliusvonkohout @varodrig

kunal-511 avatar Jul 07 '25 12:07 kunal-511

@anencore94

@Electronic-Waste

can you help here ?

juliusvonkohout avatar Jul 18 '25 11:07 juliusvonkohout

/ok-to-test

juliusvonkohout avatar Jul 18 '25 11:07 juliusvonkohout

/ok-to-test

varodrig avatar Jul 20 '25 20:07 varodrig

great job @kunal-511 thank so much for working on this. There's a lot of work and thought process on understanding the current katib kustomize files, installation and helm .

I appreciate the amazing work you have done.

We need @Electronic-Waste to do a final review. @Electronic-Waste I left a few comments for you, please review ci in detail and there are good practices implemented from security constraints, PDB, network policies and service monitoring ensure you are ok with this implementation.

varodrig avatar Jul 20 '25 20:07 varodrig

@Electronic-Waste @andreyvelich can you review please? thank you

cc @juliusvonkohout

varodrig avatar Aug 21 '25 14:08 varodrig

@Electronic-Waste @andreyvelich can you review please? thank you

cc @juliusvonkohout

First the tests must be fixed and we need a rebase to master.

juliusvonkohout avatar Aug 28 '25 19:08 juliusvonkohout

@Electronic-Waste @andreyvelich can you review please? thank you cc @juliusvonkohout

First the tests must be fixed and we need a rebase to master.

rebase to master is done Fixing the tests

kunal-511 avatar Aug 28 '25 19:08 kunal-511

/retest

kunal-511 avatar Aug 29 '25 05:08 kunal-511

@andreyvelich for approval

juliusvonkohout avatar Sep 04 '25 18:09 juliusvonkohout