Incorrect fitting results when n_fits is very large
Hi, I'm encountering an issue when running gpufit with a large number of fits.
When n_fits = 1,000,000 or 1,700,000, the fitting results become incorrect. However, with n_fits = 1,600,000, the results are correct. It seems that the output depends in a non-linear way on the number of fits, which may indicate a memory-related bug or overflow.
Below is a minimal example to reproduce the issue:
########################################## """ import numpy as np import pygpufit.gpufit as gf print('CUDA available: {}'.format(gf.cuda_available())) print('CUDA versions runtime: {}, driver: {}'.format(*gf.get_cuda_version()))
def test(n_fits): params = np.random.rand(n_fits, 2).astype(np.float32)
x = np.random.rand(n_fits, 300).astype(np.float32) * 100
y = params[:, :1] + params[:, 1:] * x
init_params = np.array([0.1, 0.1], dtype=np.float32)
init_params = np.tile(init_params, (n_fits, 1))
results = gf.fit(data=y,
weights=None,
model_id=gf.ModelID.LINEAR_1D,
initial_parameters=init_params,
tolerance = 1e-8,
user_info=x)
print("n_fits=", n_fits, "true:", params[-1], "preds:", results[0][-1])
test(1000000) test(1600000) test(1700000) """
######################################### Output: n_fits= 1000000 true: [0.73556924 0.9035081 ] preds: [ 4.8269367e+01 -8.0045573e-03] n_fits= 1600000 true: [0.08675532 0.55433005] preds: [0.086756 0.55433005] n_fits= 1700000 true: [0.61472124 0.9756066 ] preds: [54.71934 -0.07528704]
As shown, for n_fits = 1,000,000 and 1,700,000, the results are clearly incorrect, while 1,600,000 gives the expected values. The model being used is LINEAR_1D, which is normally very stable, so this behavior is unexpected.
Thanks for reporting this issue. Could you please provide some additional info regarding which OS, and which hardware you are using? Thanks
NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)"
Hi,
i ran this code to test my installation. I cannot reproduce this problem. My fit results seem fine
import numpy as np
import pygpufit.gpufit as gf
print('CUDA available: {}'.format(gf.cuda_available()))
print('CUDA versions runtime: {}, driver: {}'.format(*gf.get_cuda_version()))
def test(n_fits):
params = np.random.rand(n_fits, 2).astype(np.float32)
x = np.random.rand(n_fits, 300).astype(np.float32) * 100
y = params[:, :1] + params[:, 1:] * x
init_params = np.array([0.1, 0.1], dtype=np.float32)
init_params = np.tile(init_params, (n_fits, 1))
parameters, states, chi_squares, number_iterations, execution_time = gf.fit(
data=y,
weights=None,
model_id=gf.ModelID.LINEAR_1D,
initial_parameters=init_params,
tolerance=1e-8,
user_info=x
)
print(f"\nn_fits= {n_fits:e}, true: {params[-1]}, preds: {parameters[-1]}, delta:, {abs(params[-1]-parameters[-1])}")
print(f"states:", states[-1], "chi_squares:", chi_squares[-1],
"iterations:", number_iterations[-1], "time:", execution_time)
return parameters, states, chi_squares, number_iterations, execution_time
test(1000000)
test(1600000)
test(1700000)
output is
CUDA available: True
CUDA versions runtime: (12, 0), driver: (12, 4)
n_fits= 1.000000e+06, true: [0.7809187 0.7692733], preds: [0.78091776 0.7692733 ], delta:, [9.536743e-07 0.000000e+00]
states: 0 chi_squares: 1.8235937e-09 iterations: 4 time: 1.1855392100000017
n_fits= 1.600000e+06, true: [0.828089 0.38862136], preds: [0.8280882 0.3886214], delta:, [7.7486038e-07 2.9802322e-08]
states: 0 chi_squares: 3.8443915e-10 iterations: 4 time: 1.7962752590000264
n_fits= 1.700000e+06, true: [0.99378943 0.7045424 ], preds: [0.9937897 0.7045424], delta:, [2.3841858e-07 0.0000000e+00]
states: 0 chi_squares: 6.031087e-11 iterations: 4 time: 1.9028417580000223
Sysinfo:
cat /etc/os-release
>>>PRETTY_NAME="Ubuntu 24.04.2 LTS"
>>>NAME="Ubuntu"
>>>VERSION_ID="24.04"
>>>VERSION="24.04.2 LTS (Noble Numbat)"
>>>VERSION_CODENAME=noble
>>>ID=ubuntu
>>>ID_LIKE=debian
>>>HOME_URL="https://www.ubuntu.com/"
>>>SUPPORT_URL="https://help.ubuntu.com/"
>>>BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
>>>PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
>>>UBUNTU_CODENAME=noble
>>>LOGO=ubuntu-logo
uname -r
>>>6.14.0-24-generic
nvidia-smi
>>>Wed Jul 23 17:49:45 2025
>>>+-----------------------------------------------------------------------------------------+
>>>| NVIDIA-SMI 550.163.01 Driver Version: 550.163.01 CUDA Version: 12.4 |
>>>|-----------------------------------------+------------------------+----------------------+
>>>| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
>>>| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
>>>| | | MIG M. |
>>>|=========================================+========================+======================|
>>>| 0 NVIDIA GeForce RTX 3050 Off | 00000000:01:00.0 Off | N/A |
>>>| 0% 41C P8 N/A / 115W | 9MiB / 8192MiB | 0% Default |
>>>| | | N/A |
>>>+-----------------------------------------+------------------------+----------------------+
>>>
>>>+-----------------------------------------------------------------------------------------+
>>>| Processes: |
>>>| GPU GI CI PID Type Process name GPU Memory |
>>>| ID ID Usage |
>>>|=========================================================================================|
>>>| 0 N/A N/A 1635 G /usr/lib/xorg/Xorg 4MiB |
>>>+-----------------------------------------------------------------------------------------+
cat /proc/cpuinfo | grep "model name"
>>>model name : AMD Ryzen 5 5600G with Radeon Graphics
I'm pretty sure I ran into this or a similar issue before and it was due to propagating errors from floating point precision. I forget how I tested it, maybe try using double instead of float32? Accuracy results were random as well so run it a few times to be sure. Beware of memory limits when you do this.
I checked again. With pyGpufit it is not possible to use a data type other than np.loat32 (see https://github.com/gpufit/Gpufit/issues/144)
Still can not reproduce the issue.
If the number of fits is too large this test crashed as there is not enough system memory (RAM) to allocate the numpy arrays. In my case the for loop crashed for n=4 -> 4e6 fits
import numpy as np
import pygpufit.gpufit as gf
def test(n_fits, dtype=np.float32):
params = np.random.rand(n_fits, 2).astype(dtype)
x = np.random.rand(n_fits, 300).astype(dtype) * 100
y = params[:, :1] + params[:, 1:] * x
init_params = np.array([0.1, 0.1], dtype=dtype)
init_params = np.tile(init_params, (n_fits, 1))
parameters, states, chi_squares, number_iterations, execution_time = gf.fit(
data=y,
weights=None,
model_id=gf.ModelID.LINEAR_1D,
initial_parameters=init_params,
tolerance=1e-8,
user_info=x
)
# Check mean absolute error
error = np.mean(abs(params - parameters))
print(f"\n{n_fits:2.2e} fits with dtype: {dtype} -> mean abs error = {error:2.2e}")
testindex = np.random.randint(0, n_fits)
error_i = abs(params[testindex] - parameters[testindex])
print(f"\tchi_squares: {chi_squares[testindex]:2.2e}"
+ f"\titerations: {number_iterations[testindex]:3d}"
+ f"\ttime: {execution_time:3.3f} ms")
print(f"\tCheck fit result for random index: #{testindex}: "
+ f"\n\t\ttrue:\t{params[testindex][0]:2.3f}, {params[testindex][1]:2.3f}"
+ f"\n\t\tpreds:\t{parameters[testindex][0]:2.3f}, {parameters[testindex][1]:2.3f}"
+ f"\n\t\terror:\t{error_i[0]:2.3e}, {error_i[1]:2.3e}")
for n in range(1, 10):
test(int(n*1e6))
output
1.00e+06 fits with dtype: <class 'numpy.float32'> -> mean abs error = 1.73e-07
chi_squares: 3.92e-10 iterations: 4 time: 1.260 ms
Check fit result for random index: #769649:
true: 0.017, 0.535
preds: 0.017, 0.535
error: 6.892e-08, 0.000e+00
2.00e+06 fits with dtype: <class 'numpy.float32'> -> mean abs error = 1.73e-07
chi_squares: 5.16e-11 iterations: 4 time: 2.236 ms
Check fit result for random index: #1026410:
true: 0.457, 0.330
preds: 0.457, 0.330
error: 1.490e-07, 0.000e+00
3.00e+06 fits with dtype: <class 'numpy.float32'> -> mean abs error = 1.72e-07
chi_squares: 2.41e-09 iterations: 4 time: 3.349 ms
Check fit result for random index: #1183742:
true: 0.815, 0.960
preds: 0.815, 0.960
error: 1.132e-06, 5.960e-08
Killed
Small script to run the problem size of @DDAWX a lot of times, to check if I see a huge error at some point. All results are fine
import numpy as np
import pygpufit.gpufit as gf
def test(n_fits: float = 1e6) -> float:
n_fits = int(n_fits)
params = np.random.rand(n_fits, 2).astype(np.float32)
x = np.random.rand(n_fits, 300).astype(np.float32) * 100
y = params[:, :1] + params[:, 1:] * x
init_params = np.array([0.1, 0.1], dtype=np.float32)
init_params = np.tile(init_params, (n_fits, 1))
parameters, states, chi_squares, number_iterations, execution_time = gf.fit(
data=y,
weights=None,
model_id=gf.ModelID.LINEAR_1D,
initial_parameters=init_params,
tolerance=1e-8,
user_info=x
)
max_error = np.max(abs(params - parameters))
print(f"\tmean abs error = {np.mean(abs(params - parameters)):2.2e}")
print(f"\tmax abs error = {max_error:2.2e}")
print(f"\tHas{"" if np.any(abs(params - parameters) > 1e-3) else ' no'} fit with error > 1e-3")
return max_error
iters = 100
n_fits = 1700000
max_errors = np.zeros(iters, dtype=float)
for i in range(iters):
print(f"Test run {i+1:3d}")
max_errors[i] = test(n_fits)
print(f"Overall max error after : {np.max(max_errors):2.2e}")
output
Test run 1
mean abs error = 1.73e-07
max abs error = 2.41e-06
Has no fit with error > 1e-3
Test run 2
mean abs error = 1.72e-07
max abs error = 2.34e-06
Has no fit with error > 1e-3
...
Test run 99
mean abs error = 1.72e-07
max abs error = 2.41e-06
Has no fit with error > 1e-3
Test run 100
mean abs error = 1.72e-07
max abs error = 2.32e-06
Has no fit with error > 1e-3
Overall max error after : 2.71e-06
@DDAWX can you try running my code on your machine to reproduce the problem?