piecewise_linear_fit_py icon indicating copy to clipboard operation
piecewise_linear_fit_py copied to clipboard

Any suggestions on how to make it fit faster for large arrays?

Open doronbehar opened this issue 11 months ago • 5 comments

In my project, I have ~240 pairs of different x,y data pairs of length ~1500 and ~3000, which I want to apply a PiecewiseLinFit upon. Apparently, this is a bit slow. Here's a histogram:

Image

Generated with:

#!/usr/bin/env python

import numpy as np
import matplotlib.pyplot as plt

# Load the two-column data
lengths, init_times, fitfast_times = np.loadtxt("times.txt", unpack=True)

# Create a single figure and axis
plt.figure(figsize=(8, 5))

# Plot both histograms on the same axes
plt.hist(init_times,    bins=20, alpha=0.6, color='steelblue',  label=' __init__')
plt.hist(fitfast_times, bins=20, alpha=0.6, color='darkorange', label='fitfast')

# Add labels, legend, and title
plt.xlabel("time [s]")
plt.ylabel("Frequency")
plt.legend()
# Show the plot
plt.savefig("example.png")

times.txt is attached. I'm not sure how to expect it to affect the speed of fitfast.

While I was trying to improve the situation, I found:

https://github.com/cjekel/piecewise_linear_fit_py/blob/33c9eecb5e014080c139ab860549fc947ddc07cb/pwlf/pwlf.py#L933-L940

And I am not sure how to decrease this number (to 1 I guess) - is this a specific key word argument?

Also, I read this thread, and I'm not sure what I can try - that long and good explanation fits more for a fit and not a fastfit...

Thanks for any help!

doronbehar avatar May 05 '25 18:05 doronbehar

How many line segments are those fits?

You should be able to run fit fast with a pop=1. If it doesn't work, it's basically like running fit_guess where you pick some break points to start from.

With more than a thousand data points, I would think that finding the best brakepoint locations doesn't matter that much. fit_with_breaks will be faster if you are okay with predetermining where the breakpoints should be. An easy way of picking those would be just as asume equal length segments.

cjekel avatar May 06 '25 03:05 cjekel

How many line segments are those fits?

fit_with_breaks will be faster if you are okay with predetermining where the breakpoints should be.

Not for my data unfortunately :/. I won't mind sharing information about it if that's of any interest but I suppose it'd be too much for you.

You should be able to run fit fast with a pop=1. If it doesn't work, it's basically like running fit_guess where you pick some break points to start from.

With more than a thousand data points, I would think that finding the best brakepoint locations doesn't matter that much.

So, I had a slight suspicion that the .p_values method is consuming time, but I guess not. I also compared all of these methods you mention, and with fit_guess was given the middle of my x_data as a breakpoint, and I got the following histograms:

Image

(It is SVG now, you should be able to open it in your viewer and look at it closer.)

Interestingly, fit is faster then fitfast 😄 , at least for this data and this amount of breakpoints. Using pop=1 indeed improves it a little bit, and using fitguess improves it slightly further.

doronbehar avatar May 06 '25 15:05 doronbehar

With that many data points and two segments, I would want to do one of the optimizations that you've done.

Thanks for sharing the timings. It's honestly a bit faster than I would suspect.

One thing you can do is manually specify some of the optimization convergence behavior, or limit the number of function calls/iterations that will help bring in the tail of your run times.

You do have a bit of an embarrassingly parallel problem with that many datasets that are running the same routine. If you have access to some large CPU core system you can get try to run the fits in parallel.

cjekel avatar May 07 '25 16:05 cjekel

Thanks for sharing the timings. It's honestly a bit faster than I would suspect.

You are welcome. I'm not sure if I have the time to contribute, but it'd be nice to save in the repository data like mine for benchmarking. I believe it could be valuable for the future, if new optional optimization features will be added. I was thinking mostly of Numba and also looked briefly on Nevergrad. Would you be interested in such a contribution? If so, in which format? TXT, npz, npy or HDF5?

You do have a bit of an embarrassingly parallel problem with that many datasets that are running the same routine. If you have access to some large CPU core system you can get try to run the fits in parallel.

Yes I was thinking about it and in a certain sense I sort of tried it, but the data is located in multiple HDF5 files and loading it all into memory in parallel has it's drawbacks too..

doronbehar avatar May 08 '25 16:05 doronbehar

Today, I noticed that it was so slow for me, only when some CPU cores of the system were under heavy use!

This has brought me to investigating whether I can limit the number of workers for fmin_l_bfgs_b, but apparently not directly. Then, I asked a LLM, and with it I wrote this file:

"""

Many Numpy and Scipy operations that use C libraries to perform mathematical
operations using multiple CPU cores, can become very slow if some of the CPU
cores are under heavy use by another process - because the jobs per CPU process
are not taking an equal amount of time, and the CPU cores that are already
under heavy use return their results much later then the free CPU cores. 

This phenomena is especially significant when `scipy.optimize.fmin_l_bfgs_b`
with large vectors, and possibly other functions are affected by this. 

This file, which must be imported before any `scipy` or `numpy` function or
module is imported, helps reduce the effect of this phenomena, by setting
appropriate environment variables that are read by the C libraries of `scipy`
and `numpy`.

"""

import os
import psutil

USAGE_THRESHOLD = 90 # In %
AVAILABLE_CPU_CORES = str(max(
    # Credit to Claude.ai
    psutil.cpu_count(logical=True) - sum(
        1 for usage in psutil.cpu_percent(interval=0.5, percpu=True)
        if usage >= USAGE_THRESHOLD
    ),
    1,
))
for env_var in [
    "OMP_NUM_THREADS",
    "OPENBLAS_NUM_THREADS",
    "MKL_NUM_THREADS",
    "VECLIB_MAXIMUM_THREADS",
    "NUMEXPR_NUM_THREADS",
]:
    os.environ[env_var] = AVAILABLE_CPU_CORES

I wonder whether this knowledge and understanding could be contributed here or elsewhere 🤔 ..

doronbehar avatar May 13 '25 10:05 doronbehar