Surrogates.jl
Surrogates.jl copied to clipboard
Surrogate Global Optimization Challenge Problem
It's probably a good time to start thinking about getting a challenge problem here. I tried using the surrogate optimization for global parallel optimization and ran into scaling issues with large design matrices as it was iteratively adding a point at a time. The case I was working on was proprietary so I cannot share it, but I think it would be good to get an open challenge problem to start tackling here since I think we're close but not all the way there yet.
@jlperla I think you might have an example?
How large are we talking @ChrisRackauckas?
I was thinking something like 50 variables and objective times in the few seconds. If the objective times are in the minutes or hours I think it might be "too easy" to hide the issues of surrogate construction.
Interesting! I am up for it, however I need to think how to solve the "iteratively adding a point at a time" to let things scale. It is actually a known issue since the early days of Surrogates but I still have not many ideas
Maybe @PatricioFarrell has some ideas, or @agerlach's idea of using iterative solvers might be helpful here because then it could use the previous solution as an initialization for the surrogate?
Actually, I think I have an idea. But let me first understand what exactly the problem is: Would you like to add more than one point instead of only one at a time and/or simply avoid the gradually increasing overhead by "remembering" the previous computations?
We are referring to your second point: It would be cool not to calculate everything again when we are just adding one single point, so in some sense remembering the previous calculations, yes. But I mean also your first point looks worth exploring!
When the surrogate starts to get a lot of points, it's solving a fairly large linear system each step. If the objective function is asymptotically large that's okay, but otherwise solving a 1000x1000 linear system from scratch every time becomes its own overhead, which then makes it difficult to scale a global optimization.
I agree, you don't want to recompute. The trick is to use a multilevel residual error correction scheme, see e.g.
https://epubs.siam.org/doi/10.1137/120898383 https://dx.doi.org/doi:10.1007/s00211-010-0313-8
This would become very effective if you fill in the area of interest homogeneously, not just one by one. However, I understand that maybe there is no other way if evaluating the objective function is costly. Luckily, some of the needed machinery was in my pull request (which I think @ludoro incorporated the other day). I'm also happy to discuss via Skype/Zoom.
This looks interesting! I will take some time to understand those!
A couple of thoughts off the top of my head:
- The Sherman–Morrison–Woodbury formula can be used to perform rank 1 updates on the inverse of a matrix. I have used it in the past but it requires explicitly knowing the inverse of the original matrix... at least thats how I used it.
- In that same vain, perform rank 1 updates on a factorization of the interpolation matrix that readily leads to solutions to the linear system. This can also be used to remove points which may be helpful if you want to fit in a rolling time window like with time delay embedding in DMD.
- Using the iterative solvers approach I mentioned in #52 would be helpful for compactly supported basis functions as only the coefficients corresponding to nodes within the support of the basis centered on the new node point will change. So, yes the previous solutions should be a good initial guess. I doubt it will help If the basis is not compactly supported.
- Leverage an idea of surrogates of surrogates called the Partition of Unity Method. Here you partition the domain into N overlapping regions. You then perform N + 1 fits. 1 for each partition and then 1 that determines the linear combinations of these region fits for a single global fit. You trade a single very large linear system for many very small linear systems. You would then only need to re-solve the linear system corresponding to the region you added a node to and the global fit. The global fit is only taken with nodes at the center of each region. So, it is not huge.
@aml5600, do you have any more to add?
I'm not quite sure what you are looking for in a test problem. Are these applicable?
@agerlach I think you provided some good options! My thought would be the PUM idea would provide the most flexibility and effectiveness. An interesting benefit (addressed above) would be the ability to add batches points in addition to single-point updates. Would imagine you could update/solve the individual partition surrogates in parallel before solving the global fit.
Thanks for your input agerlach! Partition of unity looks really cool. I will implement your suggestions in the few months, stay tuned! :D
@ludoro Congrats on getting selected to GSoC again. I was happy to see it.
This sounds good, I would also be interested in an open challenge problem of this nature for benchmark purposes.
Do you only solve the eq system once when adding a sample or do you tweak some hyperparameters every time as well? If you are optimising the hyperparameters each time, perhaps some form of dropout (https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/) in combination with clustering of the samples could help reduce the cost?
I am facing a similar problem where the surrogate model creation time is becoming large with many samples. In this case I am experimenting with running the creation of the surrogate model with @async
while evaluating sample points using different infill criteria until the update of the surrogate model is complete. The objective function is a physical experiment so the evaluation time is long (~35s) and can't be changed. So far it has worked great, effectively removing the cost of the surrogate model creation completely. The number of infill points before the surrogate model is updated is often below 3. However, I suspect the optimisation progress to be negatively affected if the number of infill points becomes large before updating the surrogate model.
Hey @MrUrq , thanks for your take. I actually never change the hyper parameters once the surrogates is defined. I did not get the part about infill criteria in combination with @async, can you explain that again?
Hi @ludoro, in my case new points in the design space are sampled simultaneously as the surrogate model is being created. Once the new surrogate model is created, the previous one is replaced with the new one. This process is continuously repeated. So the timeline looks something like this: Objective function |------| |------| |------| |------| |------| |------| |------| |------| ... Surrogate creation |------------------| |------------------| |------------------| ...
To avoid that the same points are sampled several times because the same surrogate model is used for several samples, I cycle between different infill criteria. The infill criteria are ranging from more exploitative to more explorative.
This is done on separate threads with @async
so it is only beneficial if there is enough processing power available to compute the surrogate while evaluating the objective function.
I see, got it. Thanks!