Infinite loop or never returns for logistic regression in nearly degenerate case using scikit learn
Description
When using scikit learn, Logistic Regression never returns on fitting with nearly degenerate data. Scikit learn passed the blame on to liblinear.
Steps/Code to Reproduce
import sklearn.linear_model
import numpy as np
model = sklearn.linear_model.LogisticRegression()
num_pts = 15
x = np.zeros((num_pts*2, 2))
x[3] = 3.7491010398553741e-208
y = np.append(np.zeros(num_pts), np.ones(num_pts))
model.fit(x, y)
Expected Results
Return or throw error.
Actual Results
Never returns.
Versions
Linux-2.6.32-573.18.1.el6.x86_64-x86_64-with-redhat-6.7-Carbon ('Python', '2.7.12 |Anaconda 2.0.1 (64-bit)| (default, Jul 2 2016, 17:42:40) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]') ('NumPy', '1.11.0') ('SciPy', '0.17.0') ('Scikit-Learn', '0.17.1')
can you try to reproduce it with the command line interface? Otherwise it might be numerical issues caused by us (sklearn). Also, how about scaling your data ;)
Thanks for reporting this issue. we looked into it and found the issue is coming from the too small gradient norm in the beginning, which leads to a infinite loop in conjugate gradient subroutine this issue can be fixed by setting a maximum number of CG iterations. we are going to fix it in next release. thanks
Thanks, that's awesome.
Sorry for not providing a more precise source of the error.
This issue was moved to angleto/liblinear#10