Gpufit Incorrect fitting results when n

Hi, I'm encountering an issue when running gpufit with a large number of fits.

When n_fits = 1,000,000 or 1,700,000, the fitting results become incorrect. However, with n_fits = 1,600,000, the results are correct. It seems that the output depends in a non-linear way on the number of fits, which may indicate a memory-related bug or overflow.

Below is a minimal example to reproduce the issue:

########################################## """ import numpy as np import pygpufit.gpufit as gf print('CUDA available: {}'.format(gf.cuda_available())) print('CUDA versions runtime: {}, driver: {}'.format(*gf.get_cuda_version()))

def test(n_fits): params = np.random.rand(n_fits, 2).astype(np.float32)

x = np.random.rand(n_fits, 300).astype(np.float32) * 100
y = params[:, :1] + params[:, 1:] * x

init_params = np.array([0.1, 0.1], dtype=np.float32)
init_params = np.tile(init_params, (n_fits, 1))

results = gf.fit(data=y, 
                weights=None, 
                model_id=gf.ModelID.LINEAR_1D, 
                initial_parameters=init_params,
                tolerance = 1e-8,
                user_info=x)
print("n_fits=", n_fits, "true:", params[-1], "preds:", results[0][-1])

test(1000000) test(1600000) test(1700000) """

######################################### Output: n_fits= 1000000 true: [0.73556924 0.9035081 ] preds: [ 4.8269367e+01 -8.0045573e-03] n_fits= 1600000 true: [0.08675532 0.55433005] preds: [0.086756 0.55433005] n_fits= 1700000 true: [0.61472124 0.9756066 ] preds: [54.71934 -0.07528704]

As shown, for n_fits = 1,000,000 and 1,700,000, the results are clearly incorrect, while 1,600,000 gives the expected values. The model being used is LINEAR_1D, which is normally very stable, so this behavior is unexpected.

Jun 07 '25 05:06 DDAWX

Thanks for reporting this issue. Could you please provide some additional info regarding which OS, and which hardware you are using? Thanks

Jun 10 '25 10:06 superchromix

NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)"

Jun 12 '25 13:06 DDAWX

Hi,

i ran this code to test my installation. I cannot reproduce this problem. My fit results seem fine

import numpy as np
import pygpufit.gpufit as gf


print('CUDA available: {}'.format(gf.cuda_available()))
print('CUDA versions runtime: {}, driver: {}'.format(*gf.get_cuda_version()))


def test(n_fits):
    params = np.random.rand(n_fits, 2).astype(np.float32)

    x = np.random.rand(n_fits, 300).astype(np.float32) * 100
    y = params[:, :1] + params[:, 1:] * x

    init_params = np.array([0.1, 0.1], dtype=np.float32)
    init_params = np.tile(init_params, (n_fits, 1))

    parameters, states, chi_squares, number_iterations, execution_time = gf.fit(
        data=y,
        weights=None,
        model_id=gf.ModelID.LINEAR_1D,
        initial_parameters=init_params,
        tolerance=1e-8,
        user_info=x
        )
    print(f"\nn_fits= {n_fits:e}, true: {params[-1]}, preds: {parameters[-1]}, delta:, {abs(params[-1]-parameters[-1])}")
    print(f"states:", states[-1], "chi_squares:", chi_squares[-1],
          "iterations:", number_iterations[-1], "time:", execution_time)
    return parameters, states, chi_squares, number_iterations, execution_time

test(1000000)
test(1600000)
test(1700000)

output is

CUDA available: True
CUDA versions runtime: (12, 0), driver: (12, 4)

n_fits= 1.000000e+06, true: [0.7809187 0.7692733], preds: [0.78091776 0.7692733 ], delta:, [9.536743e-07 0.000000e+00]
states: 0 chi_squares: 1.8235937e-09 iterations: 4 time: 1.1855392100000017

n_fits= 1.600000e+06, true: [0.828089   0.38862136], preds: [0.8280882 0.3886214], delta:, [7.7486038e-07 2.9802322e-08]
states: 0 chi_squares: 3.8443915e-10 iterations: 4 time: 1.7962752590000264

n_fits= 1.700000e+06, true: [0.99378943 0.7045424 ], preds: [0.9937897 0.7045424], delta:, [2.3841858e-07 0.0000000e+00]
states: 0 chi_squares: 6.031087e-11 iterations: 4 time: 1.9028417580000223

Sysinfo:

cat /etc/os-release
>>>PRETTY_NAME="Ubuntu 24.04.2 LTS"
>>>NAME="Ubuntu"
>>>VERSION_ID="24.04"
>>>VERSION="24.04.2 LTS (Noble Numbat)"
>>>VERSION_CODENAME=noble
>>>ID=ubuntu
>>>ID_LIKE=debian
>>>HOME_URL="https://www.ubuntu.com/"
>>>SUPPORT_URL="https://help.ubuntu.com/"
>>>BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
>>>PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
>>>UBUNTU_CODENAME=noble
>>>LOGO=ubuntu-logo

uname -r
>>>6.14.0-24-generic

nvidia-smi

>>>Wed Jul 23 17:49:45 2025
>>>+-----------------------------------------------------------------------------------------+
>>>| NVIDIA-SMI 550.163.01             Driver Version: 550.163.01     CUDA Version: 12.4     |
>>>|-----------------------------------------+------------------------+----------------------+
>>>| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
>>>| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
>>>|                                         |                        |               MIG M. |
>>>|=========================================+========================+======================|
>>>|   0  NVIDIA GeForce RTX 3050        Off |   00000000:01:00.0 Off |                  N/A |
>>>|  0%   41C    P8             N/A /  115W |       9MiB /   8192MiB |      0%      Default |
>>>|                                         |                        |                  N/A |
>>>+-----------------------------------------+------------------------+----------------------+
>>>
>>>+-----------------------------------------------------------------------------------------+
>>>| Processes:                                                                              |
>>>|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
>>>|        ID   ID                                                               Usage      |
>>>|=========================================================================================|
>>>|    0   N/A  N/A      1635      G   /usr/lib/xorg/Xorg                              4MiB |
>>>+-----------------------------------------------------------------------------------------+


cat /proc/cpuinfo | grep "model name"
>>>model name      : AMD Ryzen 5 5600G with Radeon Graphics

Jul 23 '25 15:07 Nikongen

I'm pretty sure I ran into this or a similar issue before and it was due to propagating errors from floating point precision. I forget how I tested it, maybe try using double instead of float32? Accuracy results were random as well so run it a few times to be sure. Beware of memory limits when you do this.

Aug 30 '25 01:08 lsaca05

I checked again. With pyGpufit it is not possible to use a data type other than np.loat32 (see https://github.com/gpufit/Gpufit/issues/144) Still can not reproduce the issue.

If the number of fits is too large this test crashed as there is not enough system memory (RAM) to allocate the numpy arrays. In my case the for loop crashed for n=4 -> 4e6 fits

import numpy as np
import pygpufit.gpufit as gf


def test(n_fits, dtype=np.float32):
    params = np.random.rand(n_fits, 2).astype(dtype)

    x = np.random.rand(n_fits, 300).astype(dtype) * 100
    y = params[:, :1] + params[:, 1:] * x

    init_params = np.array([0.1, 0.1], dtype=dtype)
    init_params = np.tile(init_params, (n_fits, 1))

    parameters, states, chi_squares, number_iterations, execution_time = gf.fit(
        data=y,
        weights=None,
        model_id=gf.ModelID.LINEAR_1D,
        initial_parameters=init_params,
        tolerance=1e-8,
        user_info=x
        )
    # Check mean absolute error
    error = np.mean(abs(params - parameters))
    print(f"\n{n_fits:2.2e} fits with dtype: {dtype} -> mean abs error = {error:2.2e}")
    testindex = np.random.randint(0, n_fits)
    error_i = abs(params[testindex] - parameters[testindex])
    print(f"\tchi_squares: {chi_squares[testindex]:2.2e}"
          + f"\titerations: {number_iterations[testindex]:3d}"
          + f"\ttime: {execution_time:3.3f} ms")
    print(f"\tCheck fit result for random index: #{testindex}: "
          + f"\n\t\ttrue:\t{params[testindex][0]:2.3f}, {params[testindex][1]:2.3f}"
          + f"\n\t\tpreds:\t{parameters[testindex][0]:2.3f}, {parameters[testindex][1]:2.3f}"
          + f"\n\t\terror:\t{error_i[0]:2.3e}, {error_i[1]:2.3e}")


for n in range(1, 10):
    test(int(n*1e6))

output

1.00e+06 fits with dtype: <class 'numpy.float32'> -> mean abs error = 1.73e-07
        chi_squares: 3.92e-10   iterations:   4 time: 1.260 ms
        Check fit result for random index: #769649: 
                true:   0.017, 0.535
                preds:  0.017, 0.535
                error:  6.892e-08, 0.000e+00

2.00e+06 fits with dtype: <class 'numpy.float32'> -> mean abs error = 1.73e-07
        chi_squares: 5.16e-11   iterations:   4 time: 2.236 ms
        Check fit result for random index: #1026410: 
                true:   0.457, 0.330
                preds:  0.457, 0.330
                error:  1.490e-07, 0.000e+00

3.00e+06 fits with dtype: <class 'numpy.float32'> -> mean abs error = 1.72e-07
        chi_squares: 2.41e-09   iterations:   4 time: 3.349 ms
        Check fit result for random index: #1183742: 
                true:   0.815, 0.960
                preds:  0.815, 0.960
                error:  1.132e-06, 5.960e-08
Killed

Sep 03 '25 14:09 Nikongen

Small script to run the problem size of @DDAWX a lot of times, to check if I see a huge error at some point. All results are fine

import numpy as np
import pygpufit.gpufit as gf


def test(n_fits: float = 1e6) -> float:
    n_fits = int(n_fits)
    params = np.random.rand(n_fits, 2).astype(np.float32)

    x = np.random.rand(n_fits, 300).astype(np.float32) * 100
    y = params[:, :1] + params[:, 1:] * x

    init_params = np.array([0.1, 0.1], dtype=np.float32)
    init_params = np.tile(init_params, (n_fits, 1))

    parameters, states, chi_squares, number_iterations, execution_time = gf.fit(
        data=y,
        weights=None,
        model_id=gf.ModelID.LINEAR_1D,
        initial_parameters=init_params,
        tolerance=1e-8,
        user_info=x
        )
    max_error = np.max(abs(params - parameters))
    print(f"\tmean abs error = {np.mean(abs(params - parameters)):2.2e}")
    print(f"\tmax abs error = {max_error:2.2e}")
    print(f"\tHas{"" if np.any(abs(params - parameters) > 1e-3) else ' no'} fit with error > 1e-3")
    return max_error


iters = 100
n_fits = 1700000
max_errors = np.zeros(iters, dtype=float)
for i in range(iters):
    print(f"Test run {i+1:3d}")
    max_errors[i] = test(n_fits)

print(f"Overall max error after : {np.max(max_errors):2.2e}")

output

Test run   1
        mean abs error = 1.73e-07
        max abs error = 2.41e-06
        Has no fit with error > 1e-3
Test run   2
        mean abs error = 1.72e-07
        max abs error = 2.34e-06
        Has no fit with error > 1e-3

...

Test run  99
        mean abs error = 1.72e-07
        max abs error = 2.41e-06
        Has no fit with error > 1e-3
Test run 100
        mean abs error = 1.72e-07
        max abs error = 2.32e-06
        Has no fit with error > 1e-3
Overall max error after : 2.71e-06

@DDAWX can you try running my code on your machine to reproduce the problem?

Sep 05 '25 12:09 Nikongen

Incorrect fitting results when n_fits is very large