Can GPU fit handle functions that returns vectors instead of a single value?
Hi,
I have a function that returns more than value that I have to fit, like a f : R^3 -> R^3. I was wondering if GPUFit can handle such cases as well, but it doesn't seem to do so.
Regards
Can you be more specific? In general, a multidimensional fit can be reformulated as a normal, scalar fit, and solved accordingly.
I meant something link:
f(x_1,x_2,x_3 ; alpha_1, alpha_2) = {alpha_1x_1 + alpha_2x_2, alpha_1x_2 + alpha_2x_3, alpha_1alpha_2x_1 + e^alpha_2 x_3};
So my samples are point in 3D space and I have to fit those with that vector function. My actual case is much more complicated and doesn't have a simple closed form expression.
You can also imagine the function has to be fit using a LSE minimizer.
Is it possible to fit such a function? Or is there a workaround for that?
Yes, it can but you have to use a customized model function and fit estimator. Writing your own model function and fit estimator you can easily fit functions with vectors as output in Gpufit. See: http://gpufit.readthedocs.io/en/latest/customization.html
I guess using the built-in LSE estimator would have problems though because it assumes that the number of model values is equal to the number of independent variable values.
According to the documentation I should...
- Define an additional model ID in file constants.h. (Easy...
- Implement a CUDA device function within a newly created .cuh file in folder Gpufit/Gpufit/models according to the following template.
__device__ void ... ( // ... = function name
float const * parameters,
int const n_fits,
int const n_points,
float * value,
float * derivative,
int const point_index,
int const fit_index,
int const chunk_index,
char * user_info,
std::size_t const user_info_size)
{
///////////////////////////// value //////////////////////////////
value[point_index] = ... ; // formula calculating fit model values
/////////////////////////// derivative ///////////////////////////
float * current_derivative = derivative + point_index;
current_derivative[0 * n_points] = ... ; // formula calculating derivative values with respect to parameters[0]
current_derivative[1 * n_points] = ... ; // formula calculating derivative values with respect to parameters[1]
.
.
.
}
How do I specify in this case, "the derivative is a matrix" and "the function should output three values" instead of one? I assume for the derivative I'll just address the array accordingly, it should be a problem.
- Include the newly created .cuh file in models.cuh (Or I just add my new model in an existing .cuh I guess)
- Add a switch case in the CUDA device function calculate_model() in file models.cuh to allow calling the newly added model function. (Easy to do I guess, is there anything I should keep in mind that I might possibly not considering?).
For the estimator I guess I'll be asking something later, but should I essentially implement a specific LSE estimater for my vector function?
Thx
Hi, could anyone give me more details based on what I asked above? I can't figure out how to proper index the derivatives and function values in my case. The documentation doesn't help much...
Hi,
I think what you're asking for is already possible using the existing Gpufit framework. Your function returns a vector of values, let's say 3 values, and you are fitting data that has three data values per coordinate.
In this situation you can treat the problem as a simultaneous fit to multiple datasets, with shared parameters. For example, let's say your data values are a 3 x N array (a vector of 3 values for N coordinates) with a value [Y1,Y2,Y3] at each coordinate N. You could concatenate these values into a long 1D array [Y1(x1), Y1(x2), Y1(x3)...... Y2(x1), Y2(x2), Y2(x3).... Y3(x1), Y3(x2), Y3(x3), .... ] for sending to Gpufit. Next, write your model function to return the appropriate value, depending on which element it is working on - i.e. if your model function is working on a point in the Y1 section of the array, then it should return the first element of the vector, if it is in the Y2 section, then the second element, and so on. The same is true for the derivative values. The function is always using the same parameter set, however.
The existing LSE or MLE estimators should already work for this case, without modification.
Does this solution address your question?
I fully agree with superchromix. The existing LSE and MLE estimators will work with vector-valued functions as long as the number of data points per fit is the number of x coordinates times number of values per fit.
Just for completeness: an alternative ordering would be [Y1(x1) Y2(x1) Y3(x1) Y1(x2) Y2(x2) Y3(x3) ...].
The task reduces to implement a custom model function. Me might not yet have an example for that case. Might be good to have one.
Hi superchromix, jkfindeisen.
Because there's no example that would expound the details of what you're proposing I'd need you to guide me... So please bear with me.
In this situation you can treat the problem as a simultaneous fit to multiple datasets, with shared parameters. For example, let's say your data values are a 3 x N array (a vector of 3 values for N coordinates) with a value [Y1,Y2,Y3] at each coordinate N.
So far I'm with you...
For example, let's say your data values are a 3 x N array (a vector of 3 values for N coordinates) with a value [Y1,Y2,Y3] at each coordinate N. You could concatenate these values into a long 1D array [Y1(x1), Y1(x2), Y1(x3)...... Y2(x1), Y2(x2), Y2(x3).... Y3(x1), Y3(x2), Y3(x3), .... ] for sending to Gpufit.
For the examples provided with GPUFit the generation of the samples is implemented in a separate function. In the case of the void gauss_fit_2d_example() {...} this is where the samples generation is performed. I assume I'll need to create a function like this as unit test for my model, which convey the generation of samples according to the layout you mentioned, is this correct?
Next, write your model function to return the appropriate value, depending on which element it is working on - i.e. if your model function is working on a point in the Y1 section of the array, then it should return the first element of the vector, if it is in the Y2 section, then the second element, and so on.
According to the templates in the customization page, what parameter/parameters of the model function would tell me on what section of the array I'd be working on?
The same is true for the derivative values. The function is always using the same parameter set, however.
About the derivatives/hessians, I'm understanding that in this case the derivative would be a a matrix, while the hessian would be a 3D array in theory (though I'd re-index a 1D array to reproduce the same structure) is this correct?
I assume I'll need to create a function like this as unit test for my model, which convey the generation of samples according to the layout you mentioned, is this correct?
It's up to you if you want to create such a unit test
According to the templates in the customization page, what parameter/parameters of the model function would tell me on what section of the array I'd be working on?
The "point index" tells you which element you are working on. Your model function would contain something like a switch statement, returning a different value depending on the point index.
About the derivatives/hessians, I'm understanding that in this case the derivative would be a a matrix, while the hessian would be a 3D array in theory (though I'd re-index a 1D array to reproduce the same structure) is this correct?
The derivatives are also returned as a 1D array, of length n_parameters * n_points. In the example I gave above for each parameter and data point you have three derivatives, and you need to return them with the same ordering as how you choose to organize the input data.
Continuing with the example, the final size of your derivatives array would be n_parameters * n_points * length_of_function_vector (three in this case).
To see how to organize the derivatives, you can look into e.g. the Gaussian model function code.
e.g. http://github.com/gpufit/Gpufit/blob/master/Gpufit/models/gauss_1d.cuh
No. The derivatives are also returned as a 1D array, of length n_parameters * n_points. In the example I gave above for each parameter and data point you have three derivatives, and you need to return them with the same ordering as how you choose to organize the input data.
But in my case that 1D array will represent the Jacobian of my model right? How about the hessian instead?
In the example of the 1D gaussian function (which dependes on 4 parameters) your 1D array of derivatives represents the gradient of the model with respect to the parameters to be fitted. This means if I had two components that 1D array should represent a matrix. Is that right?
(It will still be a 1D array, but it's the interpretation that changes).
Unless, again, I'm missing something.
You don't need to calculate the Hessian - you only want the derivative of the function with respect to each of the parameters. You are essentially treating your function, which returns a vector (of e.g. length three) as three separate functions.
I won't need the hessian because the estimator won't change, right? I'll re-use the ones already available.
Yes, you can use the standard estimator.
Everything boils down to just create a new model with the features you mentioned then, if this is true I'll follow the guidelines you just gave me. I'll eventually post the example I have in mind, if it is of any interest.
Great. Yes that would be helpful - we could add it to the docs. Post back here if you need any more guidance. You might want to create a simple toy model function to get a feeling for this first, before implementing your specific function.
I'll try. Thank you.
Hi, I've this toy example.
/* Description of the test function
* =================================
* Computes function value and derivatives of the function
*
* f(x[0],x[1],x[2]) = {p[0]*x[0] + p[1]*x[1],p[0]*x[1] + p[2]*x[2]}
* The derivative, namely the Jacobian with respect the the parameters is given by
* Jacobian(f) = {
* x[0], x[1], 0.0f,
* x[1], 0.0f, x[2]
* }
*
*/
__device__ void calculate_vector_function(
float const * parameters,
int const n_fits,
int const n_points,
float * value,
float * derivative,
int const point_index,
int const fit_index,
int const chunk_index,
char * user_info,
std::size_t const user_info_size) {
// indices
float * user_info_float = (float*)user_info;
float x = 0.0f;
if (!user_info_float)
{
x = point_index;
}
else if (user_info_size / sizeof(float) == n_points)
{
x = user_info_float[point_index];
}
else if (user_info_size / sizeof(float) > n_points)
{
int const chunk_begin = chunk_index * n_fits * n_points;
int const fit_begin = fit_index * n_points;
x = user_info_float[chunk_begin + fit_begin + point_index];
}
// parameters
float const * p = parameters;
// values and jacobian
if(!(point_index & 1)) {
value[point_index] = p[0]*x[0] + p[1]*x[1]; //first component
current_derivative[0 * n_points] = x[0]; //gradient first component
current_derivative[1 * n_points] = x[1];
current_derivative[2 * n_points] = 0.0f;
} else {
value[point_index] = p[0]*x[1] + p[2]*x[2]; //second component
current_derivative[3 * n_points] = x[1]; //gradient second component
current_derivative[4 * n_points] = 0.0f;
current_derivative[5 * n_points] = x[2];
}
}
Now, did you mean something like this? Also if you could clarify how do I take into account that some parameters are shared that would be helpful.
I think it should be something more like this.
This really depends on how you are passing the data into Gpufit. Below, I assume that the data is interleaved - i.e. [Y1(0), Y2(0), Y1(1), Y2(1), ...] . I also removed the parts involving user_info, since it's not strictly necessary to use custom x values.
__device__ void calculate_vector_function(
float const * parameters,
int const n_fits,
int const n_points,
float * value,
float * derivative,
int const point_index,
int const fit_index,
int const chunk_index,
char * user_info,
std::size_t const user_info_size)
{
// indices
float * user_info_float = (float*)user_info;
float x = 0.0f;
x = point_index / 2; // INTEGER DIVISION
// parameters
float const * p = parameters;
// derivative
float * current_derivative = derivative + point_index;
// values and jacobian
if(point_index % 2 == 0) {
// even point indices - first element of the vector
value[point_index] = p[0]*x + p[1]*x; //first component
current_derivative[0 * n_points] = x; //gradient first component
current_derivative[1 * n_points] = x;
current_derivative[2 * n_points] = 0.0f;
} else {
// odd point indices - second element of the vector
value[point_index] = p[0]*x[1] + p[2]*x[2]; //second component
current_derivative[0 * n_points] = x; //gradient second component
current_derivative[1 * n_points] = 0.0f;
current_derivative[2 * n_points] = x;
}
}
`
How about the shared parameters part? (You forgot to remove the array indices in the second component by the way). Still confused how is that supposed to be done. Moreover it seems to me you used the same coordinate x to compute the function value, but the function depends from three scalar values and returns two components. Basically as I said in the comment I'm trying to implement:
f(x[0],x[1],x[2]) = {p[0]*x[0] + p[1]*x[1],p[0]*x[1] + p[2]*x[2]}
So, if your function is
f(x,y,z) = (P0 * x + P1 * y, P0 * y + P2 * z)
Then you just have to choose how you want to organize the X, Y, and Z coordinates, and the two values of the function, in the input array. The same thing is done in the 2D Gaussian example.
one possible organization:
[Y0(x0,y0,z0),Y1(x0,y0,z0),Y0(x1,y0,z0),Y1(x1,y0,z0),........]
it's up to you to decide this.
If you have unequal numbers of X, Y, and Z values, then you would also have to specify the number of X, Y, and Z points to the model function, via user_info for example, or the function has no way of knowing what the x,y,z coordinates are.
So, if your function is f(x,y,z) = (P0 * x + P1 * y, P0 * x + P2 * y)
No, the function is
f(x,y,z) = (P0x + P1y,P0y + P2z);
For the coordinates organization I'm assuming this
x[0],y[0],z[0],x[1],y[1],z[1],...,x[i],y[i],z[i],....
If N are my samples, as you said I need an array 'v' of size 3*N, therefore I can retrive the coordinates x,y,z of the i-th sample as
(x,y,z) = (v[3i],v[3i+1],v[3*i+2]).
If your function lives in a three dimensional space, then the total data size must be N_x * N_y * N_z, where N_x is the number of X coordinates, etc.
Unless, that is, your data is unevenly sampled, in which case you need to use the user_info parameter to pass in the coordinates.
If your function lives in a three dimensional space, then the total data size must be N_x * N_y * N_z, where N_x is the number of X coordinates, etc.
I'm assuming N_x = N_y = N_y, so I'll have a total of N_x^3 samples.
Also I've modified my function model as follows
__device__ void calculate_vector_function(
float const * parameters,
int const n_fits,
int const n_points,
float * value,
float * derivative,
int const point_index,
int const fit_index,
int const chunk_index,
char * user_info,
std::size_t const user_info_size) {
// indices
int const n_points_x = cbrt((float)n_points);
int const point_index_z = point_index / (n_points_x*n_points_x);
int const point_index_y = (point_index - point_index_z*n_points_x*n_points_x)/n_points_x;
int const point_index_x = point_index - point_index_z*n_points_x*n_points_x - point_index_y * n_points_x;
float const argx = point_index_x;
float const argy = point_index_y;
float const argz = point_index_z;
// parameters
float const * p = parameters;
float * current_derivative = derivative + point_index;
// values and jacobian
if(point_index % 2 == 0) {
value[point_index] = p[0]*argx + p[1]*argy; //first component
current_derivative[0 * n_points] = argx; //gradient first component
current_derivative[1 * n_points] = argy;
current_derivative[2 * n_points] = 0.0f;
} else {
value[point_index] = p[0]*argy + p[2]*argz; //second component
current_derivative[0 * n_points] = argy; //gradient second component
current_derivative[1 * n_points] = 0.0f;
current_derivative[2 * n_points] = argz;
}
}
The reason for this bit:
// indices
int const n_points_x = cbrt((float)n_points);
int const point_index_z = point_index / (n_points_x*n_points_x);
int const point_index_y = (point_index - point_index_z*n_points_x*n_points_x)/n_points_x;
int const point_index_x = point_index - point_index_z*n_points_x*n_points_x - point_index_y * n_points_x;
It's because given that I have N^3 samples then the the the indexing in the linear array follows the formula
i*N^2 + j*N + k; //i index for x, j index for y, k index for z
The reason for this bit:
float const argx = point_index_x;
float const argy = point_index_y;
float const argz = point_index_z;
It's because the gauss 2D function does something similar, not sure why though.
The multi-dimensionality makes this "toy" model more complicated, but I think this is what you want.
__device__ void calculate_vector_function(
float const * parameters,
int const n_fits,
int const n_points,
float * value,
float * derivative,
int const point_index,
int const fit_index,
int const chunk_index,
char * user_info,
std::size_t const user_info_size)
{
// indices
int_const n_function_values = 2;
int const n_points_x = cbrt((float) (n_points/n_function_values));
int const n_points_y = n_points_x;
int const n_points_z = n_points_x;
int const point_index_z = point_index / (n_points_x*n_points_y*n_function_values);
int const point_index_y = (point_index
- (point_index_z*n_points_x*n_points_y*n_function_values) )
/ (n_points_x*n_function_values);
int const point_index_x = ( (point_index
- (point_index_z*n_points_x*n_points_y*n_function_values))
- (point_index_y*n_points_x*n_function_values) )
/ (n_function_values);
float const x = (float) point_index_x;
float const y = (float) point_index_y;
float const z = (float) point_index_z;
// parameters
float const * p = parameters;
// derivative
float * current_derivative = derivative + point_index;
// values and jacobian
if(point_index % 2 == 0) {
// even point indices - first element of the vector
value[point_index] = p[0]*x + p[1]*y; //first component
current_derivative[0 * n_points] = x; //gradient first component
current_derivative[1 * n_points] = y;
current_derivative[2 * n_points] = 0.0f;
} else {
// odd point indices - second element of the vector
value[point_index] = p[0]*y + p[2]*z; //second component
current_derivative[0 * n_points] = x; //gradient second component
current_derivative[1 * n_points] = 0.0f;
current_derivative[2 * n_points] = z;
}
}
And how do I use this for the fitting now? Should I fit just one function? or two? The code below is essentially an attempt to modify the gauss2D example in order to fit my vector field. Can you tell me if my set up (before the call to the gpufit function) is correct?
void vector_function_example() {
/*
Generating sample test for fitting a vector function
*/
// number of fits, fit points and parameters
size_t const n_components = 2;
size_t const n_fits = n_components * 1;;
size_t const size_x = 50;
size_t const n_points_per_fit = size_x * size_x* size_x;
size_t const n_model_parameters = 3;
// true parameters (amplitude, center x position, center y position, width, offset)
std::vector< float > true_parameters{ 1.f, 1.5f, 2.5f};
std::cout << "generate example data" << std::endl;
// initialize random number generator
std::mt19937 rng;
rng.seed(0);
std::uniform_real_distribution< float> uniform_dist(0, 1);
// initial parameters (randomized)
std::vector< float > initial_parameters(n_model_parameters);
//for (size_t i = 0; i < n_fits; i++)
//{
for (size_t j = 0; j < n_model_parameters; j++)
{
initial_parameters[j] = true_parameters[j] + uniform_dist(rng); //This is probably wrong...
}
//}
// generate x, y and z values
std::vector< float > x(n_points_per_fit);
std::vector< float > y(n_points_per_fit);
std::vector< float > z(n_points_per_fit);
for (size_t i = 0; i < size_x; i++)
{
for (size_t j = 0; j < size_x; j++)
{
for(size_t k = 0; k < size_x; k++) {
x[i * size_x*size_x + j*size_x + k] = static_cast<float>(k);
y[i * size_x*size_x + j*size_x + k] = static_cast<float>(j);
z[i * size_x*size_x + j*size_x + k] = static_cast<float>(i);
}
}
}
// generate test data with Poisson noise
std::vector< float > temp(2*n_points_per_fit); //two components per fit
generate_vector_function(x, y, z, true_parameters, temp);
std::vector< float > data(n_fits * n_points_per_fit);
for (size_t i = 0; i < n_fits; i++)
{
for (size_t j = 0; j < n_points_per_fit; j++)
{
std::poisson_distribution< int > poisson_dist(temp[j]+0.1);
data[i * n_points_per_fit + j] = static_cast<float>(poisson_dist(rng));
}
}
// tolerance
float const tolerance = 0.001f;
// maximum number of iterations
int const max_number_iterations = 20;
// estimator ID
int const estimator_id = LSE;
// model ID
int const model_id = VECTOR_FUNCTION;
// parameters to fit (all of them)
std::vector< int > parameters_to_fit(n_model_parameters, 1);
// output parameters
std::vector< float > output_parameters(n_model_parameters);
std::vector< int > output_states(n_fits);
std::vector< float > output_chi_square(n_fits);
std::vector< int > output_number_iterations(n_fits);
// call to gpufit (C interface)
std::chrono::high_resolution_clock::time_point time_0 = std::chrono::high_resolution_clock::now();
int const status = gpufit
(
n_fits,
n_points_per_fit,
data.data(),
0,
model_id,
initial_parameters.data(),
tolerance,
max_number_iterations,
parameters_to_fit.data(),
estimator_id,
0,
0,
output_parameters.data(),
output_states.data(),
output_chi_square.data(),
output_number_iterations.data()
);
std::chrono::high_resolution_clock::time_point time_1 = std::chrono::high_resolution_clock::now();
// check status
if (status != ReturnState::OK)
{
throw std::runtime_error(gpufit_get_last_error());
}
// print execution time
std::cout << "execution time "
<< std::chrono::duration_cast<std::chrono::milliseconds>(time_1 - time_0).count() << " ms" << std::endl;
// get fit states
std::vector< int > output_states_histogram(5, 0);
for (std::vector< int >::iterator it = output_states.begin(); it != output_states.end(); ++it)
{
output_states_histogram[*it]++;
}
std::cout << "ratio converged " << (float)output_states_histogram[0] / n_fits << "\n";
std::cout << "ratio max iteration exceeded " << (float)output_states_histogram[1] / n_fits << "\n";
std::cout << "ratio singular hessian " << (float)output_states_histogram[2] / n_fits << "\n";
std::cout << "ratio neg curvature MLE " << (float)output_states_histogram[3] / n_fits << "\n";
std::cout << "ratio gpu not read " << (float)output_states_histogram[4] / n_fits << "\n";
// compute mean of fitted parameters for converged fits
std::vector< float > output_parameters_mean(n_model_parameters, 0);
for (size_t i = 0; i != n_fits; i++)
{
if (output_states[i] == FitState::CONVERGED)
{
for (size_t j = 0; j < n_model_parameters; j++)
{
output_parameters_mean[j] += output_parameters[i * n_model_parameters + j];
}
}
}
// normalize
for (size_t j = 0; j < n_model_parameters; j++)
{
output_parameters_mean[j] /= output_states_histogram[0];
}
// compute std of fitted parameters for converged fits
std::vector< float > output_parameters_std(n_model_parameters, 0);
for (size_t i = 0; i != n_fits; i++)
{
if (output_states[i] == FitState::CONVERGED)
{
for (size_t j = 0; j < n_model_parameters; j++)
{
output_parameters_std[j]
+= (output_parameters[i * n_model_parameters + j] - output_parameters_mean[j])
* (output_parameters[i * n_model_parameters + j] - output_parameters_mean[j]);
}
}
}
// normalize and take square root
for (size_t j = 0; j < n_model_parameters; j++)
{
output_parameters_std[j] = sqrt(output_parameters_std[j] / output_states_histogram[0]);
}
// print true value, fitted mean and std for every parameter
for (size_t j = 0; j < n_model_parameters; j++)
{
std::cout
<< "parameter " << j
<< " true " << true_parameters[j]
<< " fitted mean " << output_parameters_mean[j]
<< " std " << output_parameters_std[j] << std::endl;
}
// compute mean chi-square for those converged
float output_chi_square_mean = 0;
for (size_t i = 0; i != n_fits; i++)
{
if (output_states[i] == FitState::CONVERGED)
{
output_chi_square_mean += output_chi_square[i];
}
}
output_chi_square_mean /= static_cast<float>(output_states_histogram[0]);
std::cout << "mean chi square " << output_chi_square_mean << std::endl;
// compute mean number of iterations for those converged
float output_number_iterations_mean = 0;
for (size_t i = 0; i != n_fits; i++)
{
if (output_states[i] == FitState::CONVERGED)
{
output_number_iterations_mean += static_cast<float>(output_number_iterations[i]);
}
}
// normalize
output_number_iterations_mean /= static_cast<float>(output_states_histogram[0]);
std::cout << "mean number of iterations " << output_number_iterations_mean << std::endl;
}
Note that my modified example may not be the fastest solution - it is meant for illustrating the point. Also, the modulo operator is supposedly slow in CUDA.