piecewise_linear_fit_py How to constraint to integers breaks only

How to constraint to integers breaks only

Open joaoantoniocardoso opened this issue 4 years ago • 8 comments

Hi again,

in my application it would perform better if it has the ability to constraint the breaks into integers only.

To explain this situation, my application is a piecewise 3rd polynomial degree linearization for some electronic sensors, so I use the coefficients of the polynomial into values given by an ADC (analog-to-digital converter) of 10 bits, which gives-me unsigned integers values from 0 to 1023, so right now I am breaking the linear pieces flooring it (rounding down) and it potentially gives-me some error there.

Any suggestion on how to achieve that?

Aug 25 '19 21:08 joaoantoniocardoso

Might need to use a different optimization algorithm for this. GA or EGO work well with integers, but this may actually not be faster optimizations. The advantage in your case would be that the optimum output will always be an integer.

I had an experimental branch called exga, I was using DEAP to build a custom GA from a discrete set.

If I understand correctly, possible breakpoints will be between integers between [0, 1023]?

Here is my Genetic algorithm: https://github.com/cjekel/piecewise_linear_fit_py/blob/expga/pwlf/ga.py

And here is how I called such algorithm (focus between t2 = time() and t3 = time()): https://github.com/cjekel/piecewise_linear_fit_py/blob/expga/examples/compare_fitfast_and_ga.py

You would change total_set = set(my_pwlf.x_data) to be the set of discrete integers for possible breakpoints.

This was fairly experimental, and I didn't play too much with the hyper parameters of the GA.

Let me also point you in the direction of EGO, but I'll need some time to come up with the code.

Aug 25 '19 21:08 cjekel

The code in https://github.com/cjekel/piecewise_linear_fit_py/blob/expga/examples/compare_fitfast_and_ga.py is using the incorrect objective function. It should be my_pwlf.fit_with_breaks_opt in line 30. You could then use something like the following example to populate the my_pwlf parameters correctly.

I've added an example of using EGO to do this. https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/EGO_integer_only.ipynb

Aug 26 '19 14:08 cjekel

Hi,

Giving more details, my dataset is actually all float because it is a data collected from a digital oscilloscope, for example, for a particular current sensor I might have values from [0, 150] amperes for Y and [0, 5] volts for X. The microcontroller transforms this X values into digital domain representing it with an unsigned integer of 10 bits ([0, 1023], as you correctly understood). To be able to linearize it and generate a polynomial that transforms the [0, 1023] values into [0, 150], I apply a scalar transformation in my X so that [0, 5] becomes [0, 1023]. One option would be to restrict the final precision representing the resulting range with unsigned integers, but it doesn't perform very well. Instead, I found that preserving the float precision in this transformation leads to a better result in the linearization process, so in this way my X is actually a float value, but the breaks should occur at integers only. You can see in details the application in this notebook.

About GA: Following your GA example (but using fit_with_breaks_opt at lines 29 and 33, as you mentioned), I was able to apply it to my dataset, but i have to delete the following lines:

total_set.remove(x.min())
total_set.remove(x.max())

It works but only for a PiecewiseLinFit of first degree, this should be a limitation or I did something wrong? Here is the code.

About EGO: I will try to apply your example to my dataset, at first glance it seems to work very well :)

Aug 26 '19 17:08 joaoantoniocardoso

Okay. There are a couple ways to use your GA implementation.

So the _opt objective function (and throughout my other code) I assume the first and last breakpoints at x.min() and x.max(). Since the first and last point aren't actually connected to other lines, this shouldn't make much difference. If you don't mind having breakpoints at x.min() and x.max(), then you can use

number_of_line_segments = 2
degree = 3

my_pwlf = pwlf.PiecewiseLinFit(x, y, degree=degree, disp_res=False)
my_pwlf.use_custom_opt(number_of_line_segments)
total_set = set(np.floor(my_pwlf.x_data))
pop, hof, stats = genetic_algorithm(total_set, my_pwlf.nVar,
                                    my_pwlf.fit_with_breaks_opt, ngen=20,
                                    mu=125, lam=250, cxpb=0.7, mutpb=0.2,
                                    tournsize=5, verbose=True)
print(hof[0])
x_opt = [my_pwlf.x_data.min()]
x_opt += list(hof[0])
x_opt.append(my_pwlf.x_data.max())
ssr = my_pwlf.fit_with_breaks(x_opt)

plt.figure()
plt.plot(x,y, label='data')
# predict
xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)
plt.plot(xHat, yHat, label='predict')
plt.legend()
plt.show()

print('breaks: ', my_pwlf.fit_breaks)

Which would give you something like ssr: 104503.32784991479 breaks: [ 32.90733248 40. 748.29923958]

However, if x.min() and x.max() must indeed be in your set. Then we need to make a couple changes.

number_of_line_segments = 2
degree = 3

my_pwlf = pwlf.PiecewiseLinFit(x, y, degree=degree, disp_res=False)
my_pwlf.use_custom_opt(number_of_line_segments)
total_set = set(np.floor(my_pwlf.x_data))
pop, hof, stats = genetic_algorithm(total_set, number_of_line_segments+1,
                                    my_pwlf.fit_with_breaks, ngen=20,
                                    mu=125, lam=250, cxpb=0.7, mutpb=0.2,
                                    tournsize=5, verbose=True)
print(hof[0])
ssr = my_pwlf.fit_with_breaks(list(hof[0]))

plt.figure()
plt.plot(x,y, label='data')
# predict
xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)
plt.plot(xHat, yHat, label='predict')
plt.legend()
plt.show()
print(ssr)
print('breaks: ', my_pwlf.fit_breaks)

which gave something like ssr: 123888.40048433887 breaks: [ 34. 40. 187.]

Hopefully this should help

Aug 26 '19 21:08 cjekel

Also, here is the documentation on the GA scheme: https://deap.readthedocs.io/en/master/api/algo.html#deap.algorithms.eaMuPlusLambda

Aug 26 '19 21:08 cjekel

Thank you so much, now I understood and then I was able to adapted to my needs. Check it again, I think it really works well in this case.

I still want to look at EGO anyway, do you think it would perform better?

In this example the dataset is downsampled (using a simple average) by 1000 times, do you think that more data leads to a more accurate results?

Aug 27 '19 02:08 joaoantoniocardoso

If you can afford to perform the fit with with more data, you can see if the results are different than the down-sampled results. I wouldn't know without trying.

Aug 27 '19 13:08 cjekel

I still want to look at EGO anyway, do you think it would perform better?

Either should work, it's hard to say which will be better. If I had to prefer one, it would probably be the GA.

Aug 27 '19 13:08 cjekel

piecewise_linear_fit_py piecewise_linear_fit_py copied to clipboard

How to constraint to integers breaks only

piecewise_linear_fit_py
piecewise_linear_fit_py copied to clipboard