Ax icon indicating copy to clipboard operation
Ax copied to clipboard

Repeated trials without convergence

Open HBSaddiq opened this issue 1 year ago • 3 comments

Hey team, thanks for developing and maintaining the Ax library, it's great!

I had a quick issue/bug that's cropping up in my optimisation relating to repetition of trial suggestions. In my problem, I attempt to minimise two objectives which are both a function of 3 parameters (technically 4 but I infer the value of the final variable using an equality constraint). On many occasions, this works perfectly, with the following plots showing the BO scheme successfully exploring the Pareto front with successive trials, and all trials being relatively diverse from one another.

However, when I set different seeds in the optimiser (using the random_seed argument in the AxClient initialisation) the optimisation often degenerates partway through the trials I run, with the same trial being suggested many times. For example, the following two plots show this behaviour, with the final 15 or so trials being identical to one another, and all placed at around (50,30) in the metric space plot.

This feels similar to issue 228, but:

  • I don't seem to get an numerical stability warnings/errors that would help to explain the problem.

  • It doesn't seem like the optimisation has necessarily converged; For other seeds, the optimisation gets stuck earlier, at a different point without fully exploring the pareto front:

  • I unfortunately can't rely on pure Sobol sampling to fully explore parameter space, as suggested in the thread.

Is this something that the team has come across before? My hunch is that it could be the "SLSQP" optimiser getting stuck in the initial point when attempting to maximise the acquisition function. Is this feasible? I think it only really makes sense if the previous trial is used as the initial point for this inner optimisation, (hence the repeated trials), but I wondered if there could be other obvious factors at play that I might be missing. Thanks again!

HBSaddiq avatar Feb 16 '23 10:02 HBSaddiq

Thank you for reaching out about this. I have a good feeling your hunch about the acquisition function optimization could be on the right track -- @danielrjiang @SebastianAment could you two take a look at this when you get the chance?

mpolson64 avatar Feb 16 '23 14:02 mpolson64

Hi @HBSaddiq, Thanks for raising the issue. A few questions:

  1. What is are your objective thresholds (i.e. reference point)? This can significantly influence the optimization behavior.
  2. What is the search space (continuous params, ints, categoricals)? Discrete params could lead to a flatter acquisition surface.
  3. What is the true Pareto frontier (if you have a sense of this)? I'm wondering how close to convergence Ax is getting. If close to convergence, most places might have close to zero ( or zero acquisition value).

sdaulton avatar Feb 17 '23 16:02 sdaulton

Hi both, thanks for getting back to me with further clarifying questions.

  1. The reference point is (50,10), which the optimiser seems to respect as the bulk of pareto points are generated within this regime
  2. The search space is continuous
  3. The Pareto front found is "expected"/true in a sense, given that I took a look at these points in parameter space and they seem reasonable. However, I can formulate a similar problem with one extra dimension which exhibits even worse behaviour, where the optimiser sometimes finds the front, but prematurely stops exploring (1st image), and other cases where it doesn't find the front at all (second image)

My current hunch is that the search space is relatively large compared to the region of interest where the Pareto points lie, with most of the search space taken up by very suboptimal points which will never contribute to the hypervolume bounded by the front (since the reference point is relatively tight). This means that most of the of the search space has an acquisition function value of zero (I use the default qNEHVI), and is very flat, as you say. This means that optimising with only the default of 20 random restarts using SLSQP isn't likely to find the region of interest and gets stuck in a suboptimal point.

Does this sound plausible? I'm currently in the process of trying qNParEGO, and also trying out the sample_around_best flag for generating the initial candidates, which I think might solve the issue if the hunch above is true.

Thanks again!

HBSaddiq avatar Feb 21 '23 08:02 HBSaddiq