pysindy icon indicating copy to clipboard operation
pysindy copied to clipboard

Clarify and minimize pysindy.differentiation API

Open Jacob-Stevens-Haas opened this issue 2 years ago • 3 comments

I'd like to suggest moving the pysindy package out of the business of calculating derivatives and improve the interface to allow other packages to plug into pysindy.

Problems with the API

  1. Problem: BaseDifferentiation does not fully implement sklearn BaseEstimator, meaning any hyperparameter expecting the BaseEstimator API may have issues. E.g. the following raises AssertionError
sklearn.utils.estimator_checks.check_estimator(ps.SINDyDerivative(kind="kalman"))
  1. Problem: #244 avoids the appearance of breaking backwards compatibility by adding smoothed_x_ attribute to subclasses of BaseDifferentiation. However, it also changes calc_trajectory to expect that attribute, meaning that the PR does break backwards compatibility for anyone who has implemented a custom differentiator (which may be nobody). The current public API for creating a differentiation method is to subclass BaseDifferentiation and implement a _differentiate method that returns:
x_dot: array-like, shape (n_samples, n_input_features)
  Numerical time derivative of x. Entries where derivatives were
  not computed will have the value np.nan.

Because BaseDifferentiation also implements __call__, it's also possible to just use a function.
3. Problem: This research group actually has TWO packages for numerical smoothing/differentiation in python: derivative and pynumdiff, as well as the methods inside pysindy.

Potential Solutions:

  1. Remove all methods for numerical differentiation inside pysindy.
  2. Merge pynumdiff and derivative (https://github.com/florisvb/PyNumDiff/issues/48).
  3. Change the BaseDifferentiation API to:
def _differentiate(x: np.ndarray) -> tuple[np.ndarray]:
    """...
    Returns:
        Sequence whose element ``n`` is the nth order (potentially smoothed) derivative of x
    """"

What I like about this return type is that it could be extended to antiderivatives or even spatial derivatives in a backwards compatible manner. It also allows directly passing a callable, e.g. a functools.partialfunc of any of the functions in derivative or pynumdiff. But I want to think through this API, the name of the function, and whether the sklearn BaseEstimator interface is necessary, etc.

Jacob-Stevens-Haas avatar Apr 24 '23 19:04 Jacob-Stevens-Haas

I'm open to this in principle but haven't done a deep look at the pynumdiff or derivative packages. The only thing that I'd like to maintain is that the simplest finite difference and spectral derivative implementations run as fast as possible with good numpy vectorization. I implemented the current finite differences with particular focus on speeding up the library building for PDEs, since slow runtime was (and still is) a major limiting factor in the ability to conduct research. For more advanced differentiation techniques like smoothing, I think it's fine if things aren't the most efficient, since these are usually implemented at a refining stage after code is working reasonably well in simple cases. Ideally, I'd like to see a little benchmarking on a typical usage with a high-dimensional challenging problem. I'd also be happy to contribute to speeding up the derivative or pynumdiff a bit for the simplest use cases if the current pysindy versions do run faster.

znicolaou avatar Apr 24 '23 19:04 znicolaou

That's a great point - I believe the pysindy finite difference is faster than derivative, and I'd agree that a precondition for removing it is that it is incorporated into the successor of pynumdiff or derivative.

Jacob-Stevens-Haas avatar Apr 24 '23 19:04 Jacob-Stevens-Haas

Agreed that it would be great to get out of the differentiation business, and also agreed with Zack that it would be great to retain the fast finite differences. Happy with whatever you end up pursuing in this vein.

akaptano avatar Apr 24 '23 21:04 akaptano