FSharp.Stats
FSharp.Stats copied to clipboard
Goodness of fit functions for OrdinaryLeastSquares.Linear.Multivariable?
Is your feature request related to a problem? Please describe. I am following the goodness of fit quality tutorial. I want t-statistics and standard errors for coefficients from regressions with multiple independent variables. Are there functions to do this already?
The multivariate fit function (see https://fslab.org/FSharp.Stats/Fitting.html#Multivariable) has type of x:Vector<float> -> float
but GoodnessOfFit.calculateSumOfSquares
expects float -> float
. It appears that there is not a "multivariable" version.
Currently there is no implementation of calculateSumOfSquares
for multivariate regression available. I think the function could either be generalized to accept multi-dimensional input (calculateSumOfSquares (fitFunc: 'T -> float) (xData : 'U) (yData : 'T
) or a specialized function (e.g. calculateSumOfSquaresMultivariate
) that accepts matrices and vectors. The function naming maybe should be shortened.
While the first option may lead to an ambiguous signature that is difficult to interpret, the second option adds an very similar function to the module.
Do you prefer one of these options or have another idea?
My preference is for there to be one calculateSumOfSquares
function that operates on regressions regardless the number of parameters.
But this is part of a bigger comment (speaking to this https://github.com/fslaborg/FSharp.Stats/issues/94). I haven't understood why there is 1 API entry point for ols regressions with 1 feature and another for ols regressions with > 1 features. The multivariable functions would produce the same results as the univariable ones if you use an Nx1 matrix (N observations * 1 parameter) instead of an N-length vector as x. The math should be the same; is it there for some computational reason to allow better performance when there is 1 feature?
To me (and perhaps this is just me coming from a different discipline) it overcomplicates the API surface.
The current structure has emerged from our every-day data analysis work and was influenced by the chronological order it was implemented. I absolutely agree that a generic function is missing and might be straight forward to implement. Nevertheless, it is an easier and frustration-free entry point for our students to begin with specialized functions with clear signatures to not confuse e.g. matrix orientations.
Long story short, it would be great to have a generic implementation for calculateSumOfSquares, that may have specialized functions set up on top of it afterwards.
For determining significances of regression coefficients, there are F test statistics available at Testing. TestStatistics.FTestStatistics. In the process of renewing the Fitting module (#94) we aim to reduce the highly branched module structure. A generic function to test the coefficients (as in GoodnessOfFit.ttestIntercept for univariable simple linear regression) could then be introduced.
Thanks for the context. I fully support your library being optimized for your needs and priorities. And I agree that clear signatures are nice. Maybe calculateSumOfSquares.Multivariable
... Anyway more big picture thoughts in my comment on #94, but also keep in mind my comments there are in the spirit of idea generation. You are the developers and your guys' needs are the priority.
Thank you for pointing me to the Testing code. I will check that out.