piecewise_linear_fit_py
piecewise_linear_fit_py copied to clipboard
How to constraint to integers breaks only
Hi again,
in my application it would perform better if it has the ability to constraint the breaks into integers only.
To explain this situation, my application is a piecewise 3rd polynomial degree linearization for some electronic sensors, so I use the coefficients of the polynomial into values given by an ADC (analog-to-digital converter) of 10 bits, which gives-me unsigned integers values from 0 to 1023, so right now I am breaking the linear pieces flooring it (rounding down) and it potentially gives-me some error there.
Any suggestion on how to achieve that?
Might need to use a different optimization algorithm for this. GA or EGO work well with integers, but this may actually not be faster optimizations. The advantage in your case would be that the optimum output will always be an integer.
I had an experimental branch called exga, I was using DEAP to build a custom GA from a discrete set.
If I understand correctly, possible breakpoints will be between integers between [0, 1023]?
Here is my Genetic algorithm: https://github.com/cjekel/piecewise_linear_fit_py/blob/expga/pwlf/ga.py
And here is how I called such algorithm (focus between t2 = time() and t3 = time()): https://github.com/cjekel/piecewise_linear_fit_py/blob/expga/examples/compare_fitfast_and_ga.py
You would change total_set = set(my_pwlf.x_data)
to be the set of discrete integers for possible breakpoints.
This was fairly experimental, and I didn't play too much with the hyper parameters of the GA.
Let me also point you in the direction of EGO, but I'll need some time to come up with the code.
The code in https://github.com/cjekel/piecewise_linear_fit_py/blob/expga/examples/compare_fitfast_and_ga.py is using the incorrect objective function. It should be my_pwlf.fit_with_breaks_opt
in line 30. You could then use something like the following example to populate the my_pwlf
parameters correctly.
I've added an example of using EGO to do this. https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/EGO_integer_only.ipynb
Hi,
Giving more details, my dataset is actually all float because it is a data collected from a digital oscilloscope, for example, for a particular current sensor I might have values from [0, 150]
amperes for Y and [0, 5]
volts for X. The microcontroller transforms this X values into digital domain representing it with an unsigned integer of 10 bits ([0, 1023]
, as you correctly understood).
To be able to linearize it and generate a polynomial that transforms the [0, 1023]
values into [0, 150]
, I apply a scalar transformation in my X so that [0, 5]
becomes [0, 1023]
.
One option would be to restrict the final precision representing the resulting range with unsigned integers, but it doesn't perform very well.
Instead, I found that preserving the float precision in this transformation leads to a better result in the linearization process, so in this way my X is actually a float value, but the breaks should occur at integers only. You can see in details the application in this notebook.
About GA:
Following your GA example (but using fit_with_breaks_opt
at lines 29
and 33
, as you mentioned), I was able to apply it to my dataset, but i have to delete the following lines:
total_set.remove(x.min())
total_set.remove(x.max())
It works but only for a PiecewiseLinFit of first degree, this should be a limitation or I did something wrong? Here is the code.
About EGO: I will try to apply your example to my dataset, at first glance it seems to work very well :)
Okay. There are a couple ways to use your GA implementation.
So the _opt
objective function (and throughout my other code) I assume the first and last breakpoints at x.min()
and x.max()
. Since the first and last point aren't actually connected to other lines, this shouldn't make much difference. If you don't mind having breakpoints at x.min() and x.max(), then you can use
number_of_line_segments = 2
degree = 3
my_pwlf = pwlf.PiecewiseLinFit(x, y, degree=degree, disp_res=False)
my_pwlf.use_custom_opt(number_of_line_segments)
total_set = set(np.floor(my_pwlf.x_data))
pop, hof, stats = genetic_algorithm(total_set, my_pwlf.nVar,
my_pwlf.fit_with_breaks_opt, ngen=20,
mu=125, lam=250, cxpb=0.7, mutpb=0.2,
tournsize=5, verbose=True)
print(hof[0])
x_opt = [my_pwlf.x_data.min()]
x_opt += list(hof[0])
x_opt.append(my_pwlf.x_data.max())
ssr = my_pwlf.fit_with_breaks(x_opt)
plt.figure()
plt.plot(x,y, label='data')
# predict
xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)
plt.plot(xHat, yHat, label='predict')
plt.legend()
plt.show()
print('breaks: ', my_pwlf.fit_breaks)
Which would give you something like ssr: 104503.32784991479 breaks: [ 32.90733248 40. 748.29923958]
However, if x.min() and x.max() must indeed be in your set. Then we need to make a couple changes.
number_of_line_segments = 2
degree = 3
my_pwlf = pwlf.PiecewiseLinFit(x, y, degree=degree, disp_res=False)
my_pwlf.use_custom_opt(number_of_line_segments)
total_set = set(np.floor(my_pwlf.x_data))
pop, hof, stats = genetic_algorithm(total_set, number_of_line_segments+1,
my_pwlf.fit_with_breaks, ngen=20,
mu=125, lam=250, cxpb=0.7, mutpb=0.2,
tournsize=5, verbose=True)
print(hof[0])
ssr = my_pwlf.fit_with_breaks(list(hof[0]))
plt.figure()
plt.plot(x,y, label='data')
# predict
xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)
plt.plot(xHat, yHat, label='predict')
plt.legend()
plt.show()
print(ssr)
print('breaks: ', my_pwlf.fit_breaks)
which gave something like ssr: 123888.40048433887 breaks: [ 34. 40. 187.]
Hopefully this should help
Also, here is the documentation on the GA scheme: https://deap.readthedocs.io/en/master/api/algo.html#deap.algorithms.eaMuPlusLambda
Thank you so much, now I understood and then I was able to adapted to my needs. Check it again, I think it really works well in this case.
I still want to look at EGO anyway, do you think it would perform better?
In this example the dataset is downsampled (using a simple average) by 1000 times, do you think that more data leads to a more accurate results?
If you can afford to perform the fit with with more data, you can see if the results are different than the down-sampled results. I wouldn't know without trying.
I still want to look at EGO anyway, do you think it would perform better?
Either should work, it's hard to say which will be better. If I had to prefer one, it would probably be the GA.