csaps icon indicating copy to clipboard operation
csaps copied to clipboard

csaps.CubicSmoothingSpline does not handle periodic functions correctly

Open shusheer opened this issue 3 years ago • 9 comments

csaps.CubicSmoothingSpline purports to support periodic functions, but does not implement this correctly.

I believe this is because it is a smoothing spline, which means that there are far more knots stored than "effective knots" due to smoothing. This means that when the periodic function is used by the underlying scipy implementation, which likely only looks at the first and last knots, you get a sharp discontinuity.

This is all purely speculation!

A not-brilliant-fix is to repeat your data -1 and +1 period and fit just the usual (non-periodic) smoothed spline to this. There will not be any huge discontinuity, but this does not guarantee that your endpoints will match exactly, so it's better but not perfect. Note that you can't have identical x values in your input, so you need to offset very slightly, or average your endpoints.

The correct solution is probably to have the smoother correctly consider periodic functions, either when you generate the smoother with CubicSmoothingSpline() which would require an additional parameter for periodic functions, or re-implementing the underlying scipy for interpolating/extrapolating (which is probably more work).

You can test this with the following code:

import numpy as np import matplotlib.pyplot as plt

from csaps import csaps

np.random.seed(1234)

x = np.linspace(-5., 5., 25) y = np.exp(-(x/2.5)**2) + (np.random.rand(25) - 0.2) * 0.3

smoother = csaps(x, y, smooth=0.85)

xs = np.linspace(-6.5, 6.5, 100)

ys = smoother(xs) ys_periodic = smoother(xs, extrapolate='periodic')

better_smoother = csaps(np.r_[x-10.001,x,x+10.001], np.r_[y,y,y], smooth=0.85)

ys_better = better_smoother(xs)

plt.plot(x, y, 'o', xs, ys, '-', xs, ys_periodic, '-', xs, ys_better, '-') plt.show()

print("non-periodic", smoother([-5,5])) print("periodic",smoother([-5,5], extrapolate='periodic')) print("better, but not perfect",better_smoother([-5,5]))

shusheer avatar May 27 '21 11:05 shusheer

@shusheer,

Thank you for your report. You are right, periodic extrapolation from scipy PPoly seems to work incorrectly with smoothing spline. Unfortunately, I have never tested this feature.

If we look at scipy PPoly __call__ method we see the following code:

# With periodic extrapolation we map x to the segment
# [self.x[0], self.x[-1]].
if extrapolate == 'periodic':
    x = self.x[0] + (x - self.x[0]) % (self.x[-1] - self.x[0])
    extrapolate = False

I think we might implement it in the right way for the smoothing spline case. PR welcome :)

espdev avatar May 27 '21 19:05 espdev

Ok, I had a play with this, and think I have a more elegant solution, which could be more widely useful. The basic idea is to compress the representation of any CubicSmoothingSpline into a CubicSpline, and then periodic extrapolation "just works".

For any particular level of smoothing, the set of breakpoints and coefficients for a CubicSmoothingSpline over the repeated-x-range described above, can be reduced to a subset of breakpoints, which are located at the x-values of the zero-crossings of the 3rd derivative evaluated at the original (non-repeated) x values, as well as the terminal original x-values of the series. Because the 3rd derivative is piecewise constant, the zero-crossing x-value is simply the x-value of the i+1th break where the zero crossing occurs between break[i] and break[i+1]. We evaluate the CubicSmoothingSpline at these points, to produce new smoothed y values. We need to average y[0] and y[-1] in order to ensure our periodic function works as expected. These new x,y points then define a new CubicSmoothingSpline with smoothing=1.0 (or indeed simply a scipy.interpolate.CubicSpline), for which periodic extrapolation now works as expected.

This compressed representation of the CubicSmoothingSpline is more memory and computationally efficient than either the repeated-x-range hack I described above, or indeed a smoothed spline of any smoothing value < 1.0. This is true not just for the periodic case, where I needed to repeat data, but generally all CubicSmoothingSplines with smoothing value < 1.0. This becomes important for people wanting to smooth large quantities of data with heavy smoothing, and might be particularly important for the ND grid case (though the zero-crossings won't in general lie on a reduced grid). So you may wish to use this compressed representation for all internal representation (i.e. modifying csaps.SplinePPForm).

Below is example code showing what I mean. There does appear to be some kind of discontinuity in the first derivative of the CubicSmoothingSpline with smoothing=1.0 case (orange line on graph). This is not present when using a scipy.interpolate.CubicSpline with periodic boundary, so this suggests we should use this as the underlying representation (or copy their method of periodic boundary setting).

import numpy as np import matplotlib.pyplot as plt

from csaps import csaps from scipy.interpolate import CubicSpline

np.random.seed(1234)

x = np.linspace(-5., 5., 25) y = np.exp(-(x/2.5)**2) + (np.random.rand(25) - 0.2) * 0.3

smoother = csaps(x, y, smooth=0.85)

xs = np.linspace(-7.5, 7.5, 100)

ys = smoother(xs) ys_periodic = smoother(xs, extrapolate='periodic')

better_smoother = csaps(np.r_[x[:-1]-10,x,x[1:]+10], np.r_[y[:-1],y,y[1:]], smooth=0.85)

ys_better = better_smoother(xs)

idx_before_zero_crossings = np.where(np.diff(np.sign(better_smoother(x, nu=3))))[0]

x_compressed = np.unique(np.r_[x[0],x[idx_before_zero_crossings+1],x[-1]]) y_compressed = better_smoother(x_compressed) y_compressed[0] = y_compressed[-1] = np.mean(y_compressed[[0,-1]])

even_better_smoother = csaps(x_compressed, y_compressed, smooth=1.0)

best_smoother = CubicSpline(x_compressed, y_compressed, bc_type='periodic', extrapolate='periodic')

ys_even_better = even_better_smoother(xs,extrapolate='periodic') ys_best = best_smoother(xs)

plt.plot(x, y, 'o', xs, ys, '-', xs, ys_periodic, '-', xs, ys_better, '-', xs, ys_even_better, '-', xs, ys_best, '-') plt.show()

print("non-periodic", smoother([-5,5])) print("periodic",smoother([-5,5], extrapolate='periodic')) print("better, but not perfect",better_smoother([-5,5])) print("even_better",even_better_smoother([-5,5],extrapolate='periodic')) print("correct",best_smoother([-5,5]))

fig, axs = plt.subplots(4, 2, sharex=True, figsize=(15,15)) fig.suptitle('Derivatives 0-3: left = various smoothers, right = better_smoother-best_smoother') axs[0, 0].plot(xs, better_smoother(xs, nu=0), '-') axs[0, 0].plot(xs, even_better_smoother(xs, nu=0,extrapolate='periodic'), '-') axs[0, 0].plot(xs, best_smoother(xs, nu=0), '-') axs[0, 0].plot(x_compressed, best_smoother(x_compressed, nu=0), 'o') axs[0, 0].plot(x, better_smoother(x, nu=0), '.') axs[0, 1].plot(xs, better_smoother(xs, nu=0)-best_smoother(xs, nu=0,extrapolate='periodic'), '-') axs[1, 0].plot(xs, better_smoother(xs, nu=1), '-') axs[1, 0].plot(xs, even_better_smoother(xs, nu=1,extrapolate='periodic'), '-') axs[1, 0].plot(xs, best_smoother(xs, nu=1), '-') axs[1, 0].plot(x_compressed, best_smoother(x_compressed, nu=1), 'o') axs[1, 0].plot(x, better_smoother(x, nu=1), '.') axs[1, 1].plot(xs, better_smoother(xs, nu=1)-best_smoother(xs, nu=1), '-') axs[2, 0].plot(xs, better_smoother(xs, nu=2), '-') axs[2, 0].plot(xs, even_better_smoother(xs, nu=2,extrapolate='periodic'), '-') axs[2, 0].plot(xs, best_smoother(xs, nu=2), '-') axs[2, 0].plot(x_compressed, best_smoother(x_compressed, nu=2), 'o') axs[2, 0].plot(x, better_smoother(x, nu=2), '.') axs[2, 1].plot(xs, better_smoother(xs, nu=2)-best_smoother(xs, nu=2), '-') axs[3, 0].plot(xs, better_smoother(xs, nu=3), '-') axs[3, 0].plot(xs, even_better_smoother(xs, nu=3,extrapolate='periodic'), '-') axs[3, 0].plot(xs, best_smoother(xs, nu=3), '-') axs[3, 0].plot(x_compressed, best_smoother(x_compressed, nu=3), 'o') axs[3, 0].plot(x, better_smoother(x, nu=3), '.') axs[3, 1].plot(xs, better_smoother(xs, nu=3)-best_smoother(xs, nu=3), '-')

plt.show()

shusheer avatar May 27 '21 21:05 shusheer

Thanks for the explanation and the code example!

There does appear to be some kind of discontinuity in the first derivative of the CubicSmoothingSpline with smoothing=1.0 case (orange line on graph). This is not present when using a scipy.interpolate.CubicSpline with periodic boundary, so this suggests we should use this as the underlying representation (or copy their method of periodic boundary setting).

CubicSmoothingSpline has natural boundary type. It seems CubicSpline from scipy also works incorrectly with bc_type='natural' and extrapolate='periodic'.

See example:

import numpy as np
import matplotlib.pyplot as plt

from csaps import CubicSmoothingSpline
from scipy.interpolate import CubicSpline

np.random.seed(1234)

x = np.linspace(-5., 5., 25)
xs = np.linspace(-7.5, 7.5, 100)
y = np.exp(-(x/2.5)**2) + (np.random.rand(25) - 0.2) * 0.3

ss = CubicSmoothingSpline(x, y, smooth=1.0)
cs = CubicSpline(x, y, bc_type='natural')

ys = ss(xs, extrapolate='periodic')
yc = cs(xs, extrapolate='periodic')

plt.plot(x, y, 'o')
plt.plot(xs, ys, '-', linewidth=2, label='ss')
plt.plot(xs, yc, '-', linewidth=1, label='cs')
plt.legend()
plt.show()

CubicSmoothingSpline and CubicSpline work the same way in this case.

2021-05-28_11-20-06

espdev avatar May 28 '21 08:05 espdev

Good investigative powers ;)

So I think what needs to be done is the pattern for CubicSpline should be followed, where there is a bc_type parameter on creating the spline. This can default to current behavior, but when you want periodic behavior, it is set.

My understanding of the csaps code is insufficiently deep to actually implement it, but I think in _sspumv.py at def _make_spline(x, y, w, smooth, shape) you would introduce this parameter, and I think you would use slightly different formulation for the periodic boundary case here, thus avoiding my hack of repeating data points either side of the window of interest.

This would then allow proper periodic behaviour without needing my break-compression method described above, so that only becomes an optional extra that you could tack on the end of _make_spline regardless of boundary condition type. The break-compression scheme really is only necessary if people are dealing with huge amounts of data and consequently huge numbers of breaks though - if not necessary for getting periodic splines working, it may not be worth the inevitable future code maintenance overhead.

shusheer avatar May 28 '21 11:05 shusheer

By the way, the big jump at the boundary in your example code above is due to failing to match the endpoint conditions in the input data. Insert this line:

y[0] = y[-1] = np.mean(y[[0,-1]])

and it goes away.

See this in the notes for bc_type of CubicSpline:

‘periodic’: The interpolated functions is assumed to be periodic of period x[-1] - x[0]. The first and last value of y must be identical: y[0] == y[-1]. This boundary condition will result in y'[0] == y'[-1] and y''[0] == y''[-1].

My code above is also faulty in this regard - but once corrected, there is still a (smaller) evident boundary condition jump in the second derivative for CubicSmoothingSpline vs CubicSpline when CubicSpline is set up with bc_type='periodic'

shusheer avatar May 28 '21 11:05 shusheer

My understanding of the csaps code is insufficiently deep to actually implement it, but I think in _sspumv.py at def _make_spline(x, y, w, smooth, shape) you would introduce this parameter, and I think you would use slightly different formulation for the periodic boundary case here, thus avoiding my hack of repeating data points either side of the window of interest.

I will think about it. It would be nice to add various boundary conditions (the same as in CubicSpline). Also it relates to https://github.com/espdev/csaps/issues/34

espdev avatar May 28 '21 12:05 espdev

Whilst you're modifying _make_spline() you might want to look at this error path: if not all(dx > 0): raise ValueError ...

There's no good reason for this to break the code - it just makes things brittle when in production, or forces the user to wrap your code in something that does this assuming they notice the requirement. Instead, use numpy.unique to sort and group all the unique x values, and average them (this would be the same least-squares effect as repeated points).

shusheer avatar May 28 '21 13:05 shusheer

There's no good reason for this to break the code

I think explicit is better than implicit. Also if we try to set the same x values to CubicSpline we get a similar error:

ValueError: `x` must be strictly increasing sequence.

The code from scipy:

    dx = np.diff(x)
    if np.any(dx <= 0):
        raise ValueError("`x` must be strictly increasing sequence.")

I think csaps should not perform any implicit pre-processing and correcting input data. A user may not get what they expect.

espdev avatar May 28 '21 13:05 espdev

I assume it will take you some time to look into the correct boundary conditions for periodic functions.

In the meantime, here is a function that applies my earlier "compression" technique to produce a scipy CubicSpline that is very close to the CubicSmoothingSpline - this can be used for natural splines, or with periodic splines if you replicate your data at - / + 1 period and then set the xrange parameter.

# Compress a csaps spline into a scipy CubicSpline
# Calculates the zero-crossings and local min/max between these in the 3rd derivative as key points to keep
# CubicSmoothingSpline polynomials are stored such that between breaks[i] and breaks[i+1] then coeffs[:,i]
# are used, with coeffs[0,:] being the 3rd degree. So if sign(coeffs[0,i])) is -1 and sign(coeffs[0,i+1]) is +1,
# then the zero crossing is infentesimally before coeffs[0,i+1]. Therefore keep breaks[i+1] as a point of interest.
# We can also calculate the max/min between zero crossings, corresponding to the zero-crossing of the 4th derivative.

def CubicSplineFromCubicSmoothingSpline(smoothing_spline, xrange=None, bc_type='natural', extrapolate=None):
    # If using this to generate a periodic spline from  a csaps spline, data is usually replicated
    # Therefore restrict the xrange to the actual period to be modelled
    x = smoothing_spline.spline.breaks
    y3 = smoothing_spline.spline.coeffs[0,:]*6
    if xrange:
        assert xrange[0] < xrange[1]
        xminidx = max(0,np.flatnonzero(np.r_[xrange[0],x]<=xrange[0])[-1]-1)
        xmaxidx = min(x.size-1,np.flatnonzero(np.r_[x,xrange[1]]>=xrange[1])[0])
        x = x[xminidx:xmaxidx+1].copy()
        x[0]=max(xrange[0],x[0])
        x[-1]=min(xrange[1],x[-1])
        y3 = y3[xminidx:xmaxidx+1]
    idxs_to_use = [0,x.size-1]
    idx_before_zero_crossings = np.where(np.diff(np.sign(y3)))[0]
    idxs_to_use.extend(idx_before_zero_crossings.tolist())
    for aidx,bidx in zip(np.r_[0,idx_before_zero_crossings],np.r_[idx_before_zero_crossings,x.size-1]):
        idx = np.argmax(np.abs(y3[aidx:bidx+1]))+aidx
        idxs_to_use.append(idx)
    x_compressed = np.unique(x[idxs_to_use])
    y_compressed = smoothing_spline(x_compressed)
    if bc_type=='periodic':
        y_compressed[0] = y_compressed[-1] = np.mean(y_compressed[[0,-1]])
    return CubicSpline(x_compressed, y_compressed, bc_type=bc_type, extrapolate=extrapolate)

Example usage is also shown

import numpy as np
import matplotlib.pyplot as plt

from csaps import csaps
from scipy.interpolate import CubicSpline

np.random.seed(1234)

x = np.linspace(-5., 5., 25)
y = np.exp(-(x/2.5)**2) + (np.random.rand(25) - 0.2) * 0.3

repeated_smoother = csaps(np.r_[x[:-1]-10,x,x[1:]+10], np.r_[y[:-1],y,y[1:]], smooth=0.85)

periodic_spline = CubicSplineFromCubicSmoothingSpline(repeated_smoother, xrange=(x[0],x[-1]), bc_type='periodic')

fig,(ax1,ax2)=plt.subplots(1, 2, figsize=(12,4))
xs = np.linspace(-5., 5., 250)
ax1.plot(x, y, 'o', xs, repeated_smoother(xs), '-', xs, periodic_spline(xs), '-', periodic_spline.x, periodic_spline(periodic_spline.x), 'o')
xs = np.linspace(-20., 20., 250)
ax2.plot(x, y, 'o', xs, repeated_smoother(xs), '-', xs, periodic_spline(xs), '-')
plt.show()

shusheer avatar Jun 05 '21 20:06 shusheer