Optimization for Spline damping value
Description of the desired feature:
I have a function that I frequently use and was wondering if others are interested in it being included in Verde. It is an alternative to vd.SplineCV() which instead of doing a grid-search of the provided damping values, it runs an optimization to find the best damping value given upper and lower limits, and a number of trials.
For situations where you want to explore a large range of damping values, this optimization find the best values a lot quicker than the grid search in SplineCV. This either allows you to use the same amount of time to find a better value, or find a value which produce an equivalent score faster.
I use the Python package Optuna for this. I think could be implement similar to SplineCV, something like:
class SplineOptimize(damping_limits=(1e-10, 10), n_trials=10)
Here is my current implementation:
class OptimalSplineDamping:
"""
Objective function to use in an Optuna optimization for finding the optimal damping
value for fitting bi-harmonic splines.
"""
def __init__(
self,
damping_limits: tuple[float, float],
coordinates: tuple[np.ndarray, np.ndarray],
data: np.ndarray,
weights: np.ndarray | None = None,
**kwargs: typing.Any,
) -> None:
self.damping_limits = damping_limits
self.coordinates = coordinates
self.data = data
self.weights = weights
self.kwargs = kwargs
def __call__(self, trial: optuna.trial) -> float:
"""
Parameters
----------
trial : optuna.trial
the trial to run
Returns
-------
float
the score of the cross-validation
"""
damping = trial.suggest_float(
"damping",
self.damping_limits[0],
self.damping_limits[1],
log=True,
)
spline = vd.Spline(damping=damping, **self.kwargs)
return np.mean(
vd.cross_val_score(
spline,
self.coordinates,
self.data,
weights=self.weights,
),
)
And its used like this:
# define a study
study = optuna.create_study(
direction = "maximize",
# optionally specify a sampler
# sampler=optuna.integration.BoTorchSampler(n_startup_trials=4)
)
# run the optimization
study.optimize(
OptimalSplineDamping(
damping_limits=(1e-10, 10),
coordinates=proj_coordinates,
data=data.air_temperature_c,
),
n_trials=15,
)
Then you can see the resulting scores and damping parameter values from the study object:
study.best_trial.value
study.best_trial.params
Are you willing to help implement and maintain this feature?
Yes if it's something people are interested in and if we're ok with adding Optuna as a dependency.
This looks cool! I hadn’t seen Optuna before. Do you it would be possible to have something that would work with it without having to add it as a dependency?
As in a custom optimization implementation? Or just adding Optuna as an optional dependency? It could definitely be an optional dependency and would just raise a warning if users call vd.SplineOptimize().