statsample-glm icon indicating copy to clipboard operation
statsample-glm copied to clipboard

NotRegularMatrix exception for certain dataframes

Open lokeshh opened this issue 9 years ago • 6 comments

Statsample::GLM.compute is failing for certain dataframes.

> try = Daru::DataFrame.from_csv 'try.csv'
> Statsample::GLM.compute try, 'y', :logistic
ExceptionForMatrix::ErrNotRegular: Not Regular Matrix
from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/backports-3.6.8/lib/backports/1.9.2/stdlib/matrix.rb:933:in `block in inverse_from'

Get dataframe used in the above code here

lokeshh avatar Jul 13 '16 13:07 lokeshh

Weird bug.

v0dro avatar Jul 13 '16 14:07 v0dro

I think it's failing because a matrix inverse is being computed, and possibly the determinant is very close to zero which is why it's that ErrNotRegular. If I'm right, changing the matrix inverse computation algorithm should make it work.

v0dro avatar Jul 13 '16 14:07 v0dro

Here's some info I found.

I printed all the matrices whose inverse the algorithm was computing. Here's the result:

...
Matrix[[-8.459899447643453e-14, -5.75239855749016e-12], [-5.75239855749016e-12, -10927.800950741155]]
Matrix[[-3.1308289294429086e-14, -2.128675014034775e-12], [-2.128675014034775e-12, -10927.800950740906]]
Matrix[[-1.1546319456101584e-14, -7.842171356742226e-13], [-7.842171356742226e-13, -10927.800950740813]]
Matrix[[-4.218847493575589e-15, -2.865041537347675e-13], [-2.865041537347675e-13, -10927.800950740779]]
Matrix[[-1.3322676295501873e-15, -8.997247391562266e-14], [-8.997247391562266e-14, -10927.800950740766]]
Matrix[[-6.661338147750937e-16, -4.4986236957811335e-14], [-4.4986236957811335e-14, -10927.800950740762]]
Matrix[[-0.0, -0.0], [-0.0, -10927.80095074076]]
ExceptionForMatrix::ErrNotRegular: Not Regular Matrix
from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/backports-3.6.8/lib/backports/1.9.2/stdlib/matrix.rb:933:in `block in inverse_from'

In the end it is computing inverse of Matrix[[-0.0, -0.0], [-0.0, -10927.80095074076]] which is not possible.

lokeshh avatar Jul 14 '16 05:07 lokeshh

@agisga might this be an issue with the algorithm or is it loss of precision in some of the calculations?

v0dro avatar Jul 21 '16 18:07 v0dro

It seems to me that the algorithm is theoretically okay, because it gives correct results most of the time. Maybe it fails because it accumulates numerical error quickly, when the input matrix is not well conditioned.

Especially, since you mention matrix inverses, it sounds to me like the algorithm is not well optimized. It should be changed such that instead of computing matrix inverses, linear systems are solved (here is a very concise summary why). Solving a linear system is faster and numerically more stable than finding a matrix inverse.

Unfortunately right now I don't have the time to look at the algorithm in detail. I hope I can find the time to look at the algorithm in detail eventually. Probably it would be best to rewrite it such that it utilizes matrix decompositions and linear solvers provided by nmatrix-lapacke.

agisga avatar Jul 23 '16 04:07 agisga

Thanks for the explanation. I'm getting the same thing in case another example is helpful. Data is available here: https://dl.dropboxusercontent.com/u/97188721/recruitment_failures.csv

data = Daru::DataFrame.from_csv 'recruitment_failures.csv'
glm = Statsample::GLM.compute data, 'failed_recruitment', :logistic

dansbits avatar Oct 23 '16 07:10 dansbits