piecewise_linear_fit_py
piecewise_linear_fit_py copied to clipboard
Non-monotonic dataset
Do you have any recommendations in case the dataset is not monotonically increasing as in the simple example below? It is clear that there are 2 linear segments but I suspect that the non-montonic behavior in x is creating the issue...
x = np.array([0.42486339, 0.55889496, 0.70537341, 0.79477838, 0.91180935, 1. , 0.95309654, 0.85868245, 0.77580449, 0.68867638, 0.60731633, 0.523983 , 0.47161506, 0.39268367, 0.3167881 , 0.21584699, 0.11399514, 0. ])
y = np.array([0. , 0.12927029, 0.25908718, 0.34517628, 0.48018584, 0.64744466, 0.828915 , 0.83684067, 0.83602077, 0.84312654, 0.81279038, 0.81661656, 0.80595791, 0.81361028, 0.79967204, 0.84531293, 0.9150041 , 1. ])
my_pwlf = pwlf.PiecewiseLinFit(x, y)
res = my_pwlf.fit(2)
xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)
plt.figure()
plt.plot(x, y, 'o')
plt.plot(xHat, yHat, '-')
plt.show()
If you are trying to fit a model that looks like:
It's not really going to be possible with pwlf. (I'm not sure if this is the type of fit you mean by monotonic)
There are some tricks you can do with arc lengths, but the setup and normalization can be tricky. One case is you can have one model, that predicts x and y as a function of the cumulative arc length. It may or may not work, and normalization will be very important. I can perhaps try to work on an example that does such.
Yes, the purple lines are exactly the type of fit that I would like to get. In general, we can expect only 1 sharp kink and the purpose is to detect the direction of the lines before and after.
I saw that the same kind of data works just fine as long as the points define a function. For example, if you rotate the points in the picture so that there is only 1 y for a given x position, then the fit will work fine.
Although it would be possible the apply a rotation to the data before doing the pmlf fit, I wonder if it is not possible to do something smarter within pmlf itself. Does it make sense?
Well.. Instead of doing a rotation, you could break x and y into separate 1 dimensional predictions. Let's say that both x and y are functions of the arc length distance from start to end. Then you can fit a 1D pwlf to x, and a 1D pwlf to y. In the end, you'll get something like this.
It's just a different way of thinking about the problem, I'm not sure how much help it will be. I can post the code that generated this if you are interested. It is one 3 line segment model to x, and another 3 segment model to y.
Could you please share the code? It looks very good.
When you combine the 2 fits back together, how can you keep track of:
- the linear coefficients for all the intermediate pieces
- where the breakpoints are
import numpy as np
import matplotlib.pyplot as plt
import pwlf
from scipy.spatial.distance import cdist
x = np.array([0.42486339, 0.55889496, 0.70537341, 0.79477838, 0.91180935,
1. , 0.95309654, 0.85868245, 0.77580449, 0.68867638,
0.60731633, 0.523983 , 0.47161506, 0.39268367, 0.3167881 ,
0.21584699, 0.11399514, 0. ])
y = np.array([0. , 0.12927029, 0.25908718, 0.34517628, 0.48018584,
0.64744466, 0.828915 , 0.83684067, 0.83602077, 0.84312654,
0.81279038, 0.81661656, 0.80595791, 0.81361028, 0.79967204,
0.84531293, 0.9150041 , 1. ])
# find the cumlative arc lengths
data = np.vstack((x, y)).T
n = x.size
a = data[0:n-1, :]
b = data[1:n, :]
d = cdist(a, b)
arcLengths = np.diagonal(d)
ArcLengths = np.zeros_like(x)
ArcLengths[1:] = np.cumsum(arcLengths)
# model for each x and y
my_pwlf_x = pwlf.PiecewiseLinFit(ArcLengths, x)
my_pwlf_y = pwlf.PiecewiseLinFit(ArcLengths, y)
res_x = my_pwlf_x.fit(3)
res_y = my_pwlf_y.fit(3)
# generate predictions
new_arc_lengths = np.linspace(ArcLengths.min(), ArcLengths.max(), 100)
xhat = my_pwlf_x.predict(new_arc_lengths)
yhat = my_pwlf_y.predict(new_arc_lengths)
# find new breakpoints... there is two from each fit of three segments
arc_length_breaks = np.array([res_x[1], res_x[2], res_y[1], res_y[2]])
arc_length_breaks_x = my_pwlf_x.predict(arc_length_breaks)
arc_length_breaks_y = my_pwlf_y.predict(arc_length_breaks)
plt.figure()
plt.plot(x, y, 'o-', label='Origingal data')
plt.plot(xhat, yhat, '-', label="pwlf fit")
plt.plot(arc_length_breaks_x, arc_length_breaks_y, 'o', label='breakpoints')
plt.legend()
plt.figure()
plt.plot(ArcLengths, x, label='X')
plt.plot(ArcLengths, y, label='Y')
plt.plot(new_arc_lengths, xhat, label='Xhat')
plt.plot(new_arc_lengths, yhat, label='Yhat')
plt.xlabel('Arc Length')
plt.show()
When you combine the 2 fits back together, how can you keep track of:
the linear coefficients for all the intermediate pieces
They will be in separate objects, one for the x predictions, and one for the y predictions.
where the breakpoints are
In this code, I show how you can figure where the new breakpoints are. It's a bit more complicated going in. If x has three segments, and y has three segments, then you can end up with 5 new arc length segments (I think, could be wrong)
I must say I think this fitting method is very cool! It seems this would extend to higher dimensions.
That's great!
I would have to think more about the coefficients of the piecewise components. Since the fits are done on x/y vs. arclength, it is not immediately clear how to combine them together into a single set of coefficients in the original data representation. I guess it may be possible to estimate them from the breakpoints but it would be nicer (and probably more stable) to just combine the coefficients directly.
Which coefficients do you need directly?
The arc length stuff is definitely a more complicated modeling approach.
As far as doing one model, you could do fits to x as a function of y, and y as a function of x. If both x and y are normalized, then the fit that gives the lower sum of squared residuals (.ssr
) is the correct rotation to perform the model. This would only work if one of the rotations ended up being a function (as you pointed out above).
pwlf doesn't work for my case:
import numpy as np
import matplotlib.pyplot as plt
import pwlf
# your data
x = np.array([269.3048489 , 246.9444148 , 222.3136088 , 227.1488533 ,
125.1934066 , 215.8092188 , 269.9395692 , 294.6073562 ,
161.9504059 , 196.4256259 , 245.8174206 , 234.8829358 ,
239.6989843 , 159.7639371 , 81.80337907, 140.0753158 ,
69.00680592, 76.18182163, 133.0220711 , 95.4940555 ,
89.97575454, 221.3455973 , 208.2065385 , 97.75456468,
220.2748249 , 200.3221571 , 65.71646475, 78.42181787,
47.07407336, 146.9804957 , 98.75615348, 102.0246169 ,
222.8051893 , 63.80375236, 74.59112032, 219.1024445 ,
118.9913702 , 221.2607285 , 135.4354503 , 64.24703533,
93.81510511, 105.0106551 , 43.88495726, 54.49415032,
100.9858707 , 215.3200696 , 160.9971602 , 88.5518527 ,
63.08653883, 95.84675806, 131.8640344 , 58.82563146,
219.9422989 , 93.4774779 , 98.02485276, 388.5100308 ,
128.4080906 , 129.6742844 , 0. , 222.9426348 ,
56.08195898, 70.81238332, 56.71379998, 248.7115561 ,
222.2552054 , 200.081967 , 203.6329238 , 185.244671 ,
236.0417996 , 222.2993177 , 64.82775168, 89.53602526,
96.89067067, 54.2605423 , 76.72183015, 139.7620421 ,
65.3025266 ])
y = np.array([ 0. , 75.49108545, 61.51509185, 70.18320143,
364.7402165 , 109.568719 , 41.08417955, 40.83761634,
207.3589071 , 96.64997118, 32.1567632 , 34.67529359,
31.70877236, 114.413495 , 293.9381061 , 380.7518244 ,
326.403998 , 323.0019917 , 400.0171516 , 337.6460195 ,
348.2055969 , 440.1882369 , 431.5473839 , 355.7636629 ,
433.1805758 , 417.0001996 , 229.2712733 , 240.2519802 ,
290.1957078 , 393.7448968 , 343.0169598 , 334.7721919 ,
432.505317 , 326.9564546 , 229.254949 , 421.8708399 ,
197.019641 , 442.411859 , 370.5243872 , 257.5241186 ,
339.6452082 , 318.8519392 , 256.6074448 , 302.322697 ,
324.8686243 , 434.7269326 , 371.0977992 , 349.2665393 ,
293.5102796 , 341.2584752 , 377.7476104 , 296.7567181 ,
430.9114486 , 311.1336747 , 319.5093775 , 121.1779074 ,
333.5332954 , 385.9592333 , 269.3048489 , 130.2690918 ,
280.9712403 , 300.2219403 , 268.5752292 , 58.5945506 ,
132.1067479 , 193.9951334 , 170.2146256 , 132.3320591 ,
90.50234207, 61.72976525, 319.2954143 , 283.9691051 ,
301.5498668 , 306.1537013 , 249.3121043 , 164.37592 ,
224.1410448 ])
# initialize piecewise linear fit with your x and y data
my_pwlf = pwlf.PiecewiseLinFit(x, y, lapack_driver='gelsd')
# fit the data for four line segments
res = my_pwlf.fit(2)
# predict for the determined points
xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)
# Get the slopes
my_slopes = my_pwlf.slopes
# Get my model parameters
beta = my_pwlf.beta
# calculate the standard errors associated with each beta parameter
se = my_pwlf.standard_errors()
# calcualte the R^2 value
Rsquared = my_pwlf.r_squared()
# calculate the piecewise R^2 value
R2values = np.zeros(my_pwlf.n_segments)
for i in range(my_pwlf.n_segments):
# segregate the data based on break point locations
xmin = my_pwlf.fit_breaks[i]
xmax = my_pwlf.fit_breaks[i+1]
xtemp = my_pwlf.x_data
ytemp = my_pwlf.y_data
indtemp = np.where(xtemp >= xmin)
xtemp = my_pwlf.x_data[indtemp]
ytemp = my_pwlf.y_data[indtemp]
indtemp = np.where(xtemp <= xmax)
xtemp = xtemp[indtemp]
ytemp = ytemp[indtemp]
# predict for the new data
yhattemp = my_pwlf.predict(xtemp)
# calcualte ssr
e = yhattemp - ytemp
ssr = np.dot(e, e)
# calculate sst
ybar = np.ones(ytemp.size) * np.mean(ytemp)
ydiff = ytemp - ybar
sst = np.dot(ydiff, ydiff)
R2values[i] = 1.0 - (ssr/sst)
# plot the results
plt.figure()
plt.plot(x, y, 'o')
plt.plot(xHat, yHat, '-')
plt.show()
@loveis98 Can you sketch on that plot what kind of fit you'd expect to that data set? Is your data following some order of occurrences? ie, does x[0] y[0] occur before x[1] y[1]
This figure is what your code spit out, and looking at the data, I can't quite see what kind of fit you'd expect.