pysindy
pysindy copied to clipboard
[BUG] Cannot provide controls u for cross validation with scikit-learn
When you provide a control variable u
to fit SINDy with cross validation from scikit-learn, there is an error:
TypeError: Model was fit using control variables, so u is required
.
It seems like this error occurs in the predict and/or score functions: https://pysindy.readthedocs.io/en/latest/_modules/pysindy/pysindy.html
Possibly, the controls u, which are provided to the fit function for cross validation, are not passed on to these other functions (predict and/or score) during cross validation?
Reproducing code example:
This code example is taken from here: https://pysindy.readthedocs.io/en/latest/examples/4_scikit_learn_compatibility.html#cross-validation
However, I have added 1 constant control variable u to show that the code fails when I pass u as argument: search.fit(x_train, u = u_train)
. I have highlighted the relevant rows with the comment "RELEVANT ROW".
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import TimeSeriesSplit
import numpy as np
from scipy.integrate import odeint
import pysindy as ps
def lorenz(z, t):
return [
10 * (z[1] - z[0]),
z[0] * (28 - z[2]) - z[1],
z[0] * z[1] - (8 / 3) * z[2]
]
# Generate training data
dt = .002
t_train = np.arange(0, 10, dt)
x0_train = [-8, 8, 27]
x_train = odeint(lorenz, x0_train, t_train)
# Make up control input
u_train = np.ones(x_train.shape[0]) # RELEVANT ROW
# Fit model
model = ps.SINDy(t_default=dt)
param_grid = {
"optimizer__threshold": [0.001, 0.01, 0.1],
"optimizer__alpha": [0.01, 0.05, 0.1],
"feature_library": [ps.PolynomialLibrary(), ps.FourierLibrary()],
"differentiation_method__order": [1, 2]
}
search = GridSearchCV(
model,
param_grid,
cv=TimeSeriesSplit(n_splits=5)
)
search.fit(x_train, u = u_train) # RELEVANT ROW
print("Best parameters:", search.best_params_)
search.best_estimator_.print()
Error message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-76-6c5d752be13a> in <module>
38 cv=TimeSeriesSplit(n_splits=5)
39 )
---> 40 search.fit(x_train, u = u_train) # RELEVANT ROW
41
42 print("Best parameters:", search.best_params_)
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
734 return results
735
--> 736 self._run_search(evaluate_candidates)
737
738 # For multi-metric evaluation, store the best_index_, best_params_ and
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
1186 def _run_search(self, evaluate_candidates):
1187 """Search all candidates in param_grid"""
-> 1188 evaluate_candidates(ParameterGrid(self.param_grid))
1189
1190
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
706 n_splits, n_candidates, n_candidates * n_splits))
707
--> 708 out = parallel(delayed(_fit_and_score)(clone(base_estimator),
709 X, y,
710 train=train, test=test,
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/joblib/parallel.py in __call__(self, iterable)
1027 # remaining jobs.
1028 self._iterating = False
-> 1029 if self.dispatch_one_batch(iterator):
1030 self._iterating = self._original_iterator is not None
1031
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
845 return False
846 else:
--> 847 self._dispatch(tasks)
848 return True
849
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/joblib/parallel.py in _dispatch(self, batch)
763 with self._lock:
764 job_idx = len(self._jobs)
--> 765 job = self._backend.apply_async(batch, callback=cb)
766 # A job can complete so quickly than its callback is
767 # called before we get here, causing self._jobs to
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
573
574 def get(self):
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/joblib/parallel.py in __call__(self)
250 # change the default number of processes to -1
251 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 252 return [func(*args, **kwargs)
253 for func, args, kwargs in self.items]
254
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/joblib/parallel.py in <listcomp>(.0)
250 # change the default number of processes to -1
251 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 252 return [func(*args, **kwargs)
253 for func, args, kwargs in self.items]
254
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, error_score)
558 else:
559 fit_time = time.time() - start_time
--> 560 test_scores = _score(estimator, X_test, y_test, scorer)
561 score_time = time.time() - start_time - fit_time
562 if return_train_score:
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/sklearn/model_selection/_validation.py in _score(estimator, X_test, y_test, scorer)
603 scorer = _MultimetricScorer(**scorer)
604 if y_test is None:
--> 605 scores = scorer(estimator, X_test)
606 else:
607 scores = scorer(estimator, X_test, y_test)
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/sklearn/metrics/_scorer.py in __call__(self, estimator, *args, **kwargs)
88 *args, **kwargs)
89 else:
---> 90 score = scorer(estimator, *args, **kwargs)
91 scores[name] = score
92 return scores
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/sklearn/metrics/_scorer.py in _passthrough_scorer(estimator, *args, **kwargs)
370 def _passthrough_scorer(estimator, *args, **kwargs):
371 """Function that wraps estimator.score"""
--> 372 return estimator.score(*args, **kwargs)
373
374
~/.cache/pypoetry/virtualenvs/powerup-process-control-VY7sLJ9r-py3.8/lib/python3.8/site-packages/pysindy/pysindy.py in score(self, x, t, x_dot, u, multiple_trajectories, metric, **metric_kws)
461 if u is None or self.n_control_features_ == 0:
462 if self.n_control_features_ > 0:
--> 463 raise TypeError(
464 "Model was fit using control variables, so u is required"
465 )
TypeError: Model was fit using control variables, so u is required
PySINDy/Python version information:
'1.2.0'
Thanks for the detailed bug report. It makes it much easier for us to pin down the issue.
Possibly, the controls u, which are provided to the fit function for cross validation, are not passed on to these other functions (predict and/or score) during cross validation?
You're exactly right. I will have to investigate potential solutions.
@juliuskittler @briandesilva Did you find a solution? I'm facing the same issue.
@slimeth I don't think there is any solution yet for Grid Search with CV in scikit-learn. I switched to using Bayesian Optimization for hyperparameter tuning: https://optuna.org
I have been slow about implementing a solution, partly because it would require nontrivial changes to the API. I'll prioritize getting out a fix, but it may live in a separate branch for the time being.
@slimeth I don't think there is any solution yet for Grid Search with CV in scikit-learn. I switched to using Bayesian Optimization for hyperparameter tuning: https://optuna.org
I'd like to second the suggestion to use Bayesian optimization. It's a much more efficient approach than grid or random search. Thanks, @juliuskittler, for suggesting it!
@juliuskittler Optuna works like a charm! Thank you for this suggestion!
@juliuskittler @slimeth : Is it possible to share a code snippet on use of optuna for SINDy hyperparameter tuning with control variables? I have the same issue.
I believe this relates to SINDy()
not implementing the full Estimator interface. It is not quite clear in the documentation, but score()
cannot implement any other arguments besides x and y. The code for GridSearchCV.fit
bundles up extra fit_params
and sends to _fit_and_score()
, but the latter only sends these params to fit()
and not score()
.