Duplicate hyperparameters waste compute and time
What happened?
I am experiencing a recurring issue where the hyperparameter tuning process generates duplicate sets of parameters, leading to inefficient use of GPU resources.
For instance, with the experimental setup below:
spec:
algorithm:
algorithmName: bayesianoptimization
maxTrialCount: 10
metricsCollectorSpec:
collector:
kind: StdOut
objective:
goal: 1
metricStrategies:
- name: Accuracy
value: max
objectiveMetricName: Accuracy
type: maximize
parallelTrialCount: 1
parameters:
- feasibleSpace:
list:
- "0.01"
- "1"
- "5"
- "10"
- "0.1"
name: C
parameterType: categorical
- feasibleSpace:
list:
- linear
- rbf
- poly
- sigmoid
name: kernal
parameterType: categorical
- feasibleSpace:
list:
- "0.1"
- "0.001"
- "0.01"
name: gamma
parameterType: categorical
The suggestions provided by the algorithm are often redundant. The following output illustrates this problem, showing only the duplicated suggestions for clarity:
spec:
algorithm:
algorithmName: bayesianoptimization
requests: 10
resumePolicy: Never
status:
suggestionCount: 10
suggestions:
- name: mnist-pytorch-rep-bo-1-v1-g2qml7vd
parameterAssignments:
- name: C
value: "10"
- name: kernal
value: poly
- name: gamma
value: "0.1"
- name: mnist-pytorch-rep-bo-1-v1-hjzn82gn
parameterAssignments:
- name: C
value: "10"
- name: kernal
value: poly
- name: gamma
value: "0.1"
I have run this experiment repeatedly, and in extreme cases, all 10 suggestions are identical.
What did you expect to happen?
- Filter out duplicated hyperparameters.
- Early stop once the hyperparameter search space is exhausted.
Environment
- Kubernetes version: v1.25.14
- Katib controller version: kubeflow/katib-controller:v0.16.0
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
Proposal: Prevent Duplicate Hyperparameter Suggestions
Issue: #2571
Summary: Add duplicate detection in suggestion controller to prevent wasted compute on identical hyperparameter trials.
Problem
Katib allows duplicate suggestions to create trials. Users report "in extreme cases, all 10 suggestions are identical" causing:
- Wasted GPU/CPU on redundant experiments
- Inefficient use of trial budgets
- Poor user experience with small categorical search spaces
Example (60 categorical combinations, Bayesian Optimization):
# Both suggestions identical - waste of resources
- trial-1: {C: "10", kernel: poly, gamma: "0.1"}
- trial-2: {C: "10", kernel: poly, gamma: "0.1"}
Root Cause:
- scikit-optimize uses random sampling for first
n_initial_points=10trials - Small categorical spaces → high duplicate probability
- No validation layer prevents duplicate trials
Solution
Add duplicate filtering in SyncAssignments() before creating trials:
// pkg/controller.v1beta1/suggestion/suggestionclient/suggestionclient.go
func parametersEqual(a, b []commonapiv1beta1.ParameterAssignment) bool {
if len(a) != len(b) { return false }
aMap := make(map[string]string)
for _, p := range a { aMap[p.Name] = p.Value }
for _, p := range b {
if aMap[p.Name] != p.Value { return false }
}
return true
}
func isDuplicate(
assignment []commonapiv1beta1.ParameterAssignment,
suggestions []suggestionsv1beta1.TrialAssignment,
trials []trialsv1beta1.Trial) bool {
for _, s := range suggestions {
if parametersEqual(assignment, s.ParameterAssignments) { return true }
}
for _, t := range trials {
if parametersEqual(assignment, t.Spec.ParameterAssignments) { return true }
}
return false
}
func (g *General) SyncAssignments(...) error {
// ... get suggestions from algorithm ...
uniqueAssignments := []suggestionsv1beta1.TrialAssignment{}
duplicateCount := 0
for _, suggestion := range responseSuggestion.ParameterAssignments {
if isDuplicate(suggestion.Assignments, instance.Status.Suggestions, ts) {
duplicateCount++
continue
}
uniqueAssignments = append(uniqueAssignments, createTrialAssignment(suggestion))
}
if duplicateCount > 0 {
g.recorder.Eventf(instance, corev1.EventTypeWarning,
"DuplicateSuggestionsFiltered",
"Filtered %d duplicate(s), created %d unique suggestion(s)",
duplicateCount, len(uniqueAssignments))
}
instance.Status.Suggestions = append(instance.Status.Suggestions, uniqueAssignments...)
return nil
}
Why controller-side:
- ✅ Works with all algorithms (no library changes)
- ✅ Single validation point
- ✅ Backward compatible
- ✅ O(n) performance with hash maps
Testing
Unit Tests:
func TestParametersEqual(t *testing.T) {
// Test identical parameters → true
// Test different order, same values → true
// Test different values → false
}
func TestIsDuplicate(t *testing.T) {
// Test against existing suggestions
// Test against existing trials
// Test with no duplicates
}
Integration Test - Small categorical space (2×2=4 combinations):
parameters:
- name: optimizer
parameterType: categorical
feasibleSpace: {list: ["adam", "sgd"]}
- name: activation
parameterType: categorical
feasibleSpace: {list: ["relu", "tanh"]}
maxTrialCount: 10
Expected: 4 unique trials, duplicates filtered, warning event emitted.
Alternatives Considered
| Alternative | Reason Rejected |
|---|---|
| Modify algorithm services (skopt, optuna) | High maintenance, must fork libraries |
| Database deduplication | Performance overhead, complex schema changes |
| User-configurable option | Doesn't solve problem by default |
Files Modified:
-
pkg/controller.v1beta1/suggestion/suggestionclient/suggestionclient.go -
pkg/controller.v1beta1/suggestion/suggestionclient/suggestionclient_test.go
@andreyvelich Could you please review this proposal and let me know if you have any feedback? Once I get your input, I’ll start working on the issue.
Thanks for creating this @Antsypc!
@NarayanaSabari I would suggest that we first of all migrate to out of deprecated skopt algorithm service: https://github.com/kubeflow/katib/issues/2280
As we discussed, we would like to use Optuna's GPSampler: https://github.com/kubeflow/katib/issues/2280#issuecomment-1993658378
Then, we need to see whether this sample produce any duplicated hyperparameters.
Do you want to take over this task ?
cc @contramundum53
Thanks @andreyvelich for the feedback!
You're absolutely right - addressing the root cause makes more sense. I'll take over the migration from deprecated scikit-optimize to Optuna's GPSampler (#2280) first.
Plan:
- Complete the skopt → Optuna GPSampler migration
- Test if duplicate hyperparameters still occur with the new algorithm
- If duplicates persist, implement the controller-level filtering as an algorithm-agnostic safety layer
This approach will modernize Katib's Bayesian Optimization to use a maintained library while naturally testing whether duplicates are algorithm-specific or a broader issue.
I'll start working on #2280 and report back with findings. Thanks for the guidance!
/assign
If you decide to use GPSampler in Optuna and you don't want duplicate suggestions, I would suggest using deterministic_objective=True. That option basically fixes the noise variance to a very low level, and would hopefully eliminate most of the duplicate suggestions.
If you have stricter requirements (if you want to never have duplicate suggestions, even stop automatically when the search space is exhausted), you would still need some mechanisms to prevent them.
One limitation is that deterministic_objective=True really assumes that the objective function is deterministic, and if it's not the optimization may become unstable (especially with continuous search space where you can have lots of points that are very close but not the same).