mlrMBO icon indicating copy to clipboard operation
mlrMBO copied to clipboard

mbo estimate with kriging is constant at some iterations

Open verenamayer opened this issue 7 years ago • 13 comments

Sometimes it happens that the kriging estimate is constant on the parameter space. Therefore also the infillcrit and se estimate is constant. Probably, thats not good. Here a small example:

alpine = makeAlpine01Function(2)

lrn = makeLearner("regr.km", predict.type = "se")

ctrl = makeMBOControl()
ctrl = setMBOControlTermination(ctrl, iters = 5)
ctrl = setMBOControlInfill(ctrl, crit = "ei", 
                           opt = focussearch, 
                           opt.focussearch.maxit = 2, 
                           opt.focussearch.points = 10)

set.seed(11)
initdes = generateDesign(par.set = getParamSet(alpine), n = 10)

run = exampleRun(fun = alpine, 
                 design = initdes, 
                 learner = lrn, 
                 control = ctrl, 
                 points.per.dim = 50L, 
                 show.info = TRUE)

plotExampleRun(run)

verenamayer avatar Sep 22 '16 12:09 verenamayer

some comments from my side

  • we dont have to fix this now, before the CRAN upload, but this should become high prio soon
  • this is really not our fault, but a problem in the fit of DiceKriging. also, it does not help (robustly) to -- change the kernel -- add a slight nugget effect -- the points are also not too close together (distance is high enough) i checked all of this with verena.
  • first order of business should be to detect this in mlrMBO. this is simple. either do this model agnostically (focus search always sees the same points?). we could also ask the DiceKriging model about its internal params, but this is then very specialized for this model
  • then we can talk about what todo. we could just spawn random points. but i would rather fit a different "fallback" model in such cases. EG an RF

berndbischl avatar Sep 22 '16 12:09 berndbischl

first order of business should be to detect this in mlrMBO. this is simple. either do this model agnostically

This is already implemented in branch:smart_scheduling

jakob-r avatar Sep 26 '16 10:09 jakob-r

This is already implemented in branch:smart_scheduling

thats good. can we please try to extract these very useful things from the branch and merge them into master a bit sooner? this would also reduce the horrible problem of reviewing a very "rich" branch at the end.

berndbischl avatar Sep 26 '16 15:09 berndbischl

So it does not even seem to be a problem with DiceKriging, GPfit also creates constant predictions (with or without nugget effect).

ja-thomas avatar Sep 29 '16 07:09 ja-thomas

Good to know! Probably they run in to the same numerical problems?

jakob-r avatar Sep 29 '16 07:09 jakob-r

constant

here is a small simulation I ran. ~~It seems really strange that when we add a nugget effect (10^-3) the model will be constant in more situations...~~

ja-thomas avatar Oct 19 '16 13:10 ja-thomas

objective

and here are the objective values

ja-thomas avatar Oct 19 '16 13:10 ja-thomas

Ok some more insights:

  • the model is not truly constant. It has some spikes at the already evaluated points which still get interpolated (with an error around 10^-16).
  • The model seems to be smooth, e.g., points that are really near the design points (in the region of 10^-7) lie between the design points and the "constant" value
  • setting jitter=TRUE doesn't change anything

ja-thomas avatar Oct 20 '16 12:10 ja-thomas

Can you put the script in a gist?

jakob-r avatar Oct 20 '16 13:10 jakob-r

https://gist.github.com/ja-thomas/6e12b4d58ddefddaa9626631e1e8cebd

ja-thomas avatar Oct 20 '16 13:10 ja-thomas

Thx @ja-thomas

jakobbossek avatar Oct 26 '16 18:10 jakobbossek

I also had a look at this problem some months ago and my conclusion was: DiceKriging calculates the parameters of the Kriging model via numerical optimization. This numerical optimization can fail, i.e. the optimization only finds a local optima, and not the global best parameters. Since the internal optimization of DiceKriging is stochastic, simply fitting the model again can return the "optimal", non-constant model.

However, I only looked at some small examples (something like initial designs of size 4) and I don't know if this explanation extends to other cases.

danielhorn avatar Oct 31 '16 09:10 danielhorn

I tested three more settings. km with BFGS, km with three restarts and BFGS and km with rgenoud

fraction_constant res

# A tibble: 3 × 2
    algorithm mean_runtime
       <fctr>        <dbl>
1          km     11.36650
2 km_restarts     13.78398
3  km_rgenoud     12.27593

The restarts do not seem to help but rgenoud reduced the number of times the model is constant while the objective values seem similar for all methods. rgenoud is also slightly slower.

We could think about switching to rgenoud for the default optimizer, but as reducing the number of times the model is getting constant does not really seem to improve the performance I'm not sure if we really need to do that.

ja-thomas avatar Nov 07 '16 09:11 ja-thomas