pyglmnet solution dependence on regularization path

I got negative pseudo-R2s on the V4 data

so I checked: running with default reg_lambdas seems to work fine

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

# create an instance of the GLM class
model = GLM(distr='poissonexp', verbose=False, alpha=0.05)

n_samples, n_features = 10000, 100

# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)

# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)

# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
model.fit(scaler.transform(Xr), yr)

# we'll get .fit_params after .fit(), here we get one set of fit parameters
fit_param = model[-1].fit_

# we can use fitted parameters to predict
yhat = model.predict(scaler.transform(Xt))
print 'reg_lambdas:', model.reg_lambda
print 'pseudo_R2s:', [model.pseudo_R2(yt,i,np.mean(yr)) for i in yhat]

Output:

reg_lambdas: [ 0.5         0.3237394   0.2096144   0.13572088  0.08787639  0.0568981
  0.03684031  0.02385332  0.01544452  0.01      ]
pseudo_R2s: [0.43841608601335824, 0.49823894207455854, 0.54805964244862948, 
0.5411179410997351, 0.56589624430495222, 0.52914657441486135, 0.57277441499592874, 
0.52690973730494284, 0.57043414426245309, 0.53886579676913726]

However if I choose another, even not that different range here’s what happens:

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

# create an instance of the GLM class
model = GLM(distr='poissonexp', verbose=False, alpha=0.05, reg_lambda=[0,0.0001,0.001,0.01,0.1,0.2,0.5])

n_samples, n_features = 10000, 100

# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)

# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)

# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
model.fit(scaler.transform(Xr), yr)

# we'll get .fit_params after .fit(), here we get one set of fit parameters
fit_param = model[-1].fit_

# we can use fitted parameters to predict
yhat = model.predict(scaler.transform(Xt))
print 'reg_lambdas:', model.reg_lambda
print 'pseudo_R2s:', [model.pseudo_R2(yt,i,np.mean(yr)) for i in yhat]

output:

reg_lambdas: [0, 0.0001, 0.001, 0.01, 0.1, 0.2, 0.5]
pseudo_R2s: [0.10370201748019414, -0.024446271196854941, 0.10307940713282349, 
-0.019807866066520852, 0.040822101620644258, 0.087468883241032636, 0.17386339833109343]

May 11 '16 16:05 hugoguh

if I switch the order of the reg_lambdas then it works again. i.e. if I give it reg_lambda=[0.5, 0.2, 0.1, 0.01, 0.001, 0.0001, 0]) instead of reg_lambda=[0, 0.0001, 0.001, 0.01, 0.1, 0.2, 0.5]) then it works.

So my guess is that it is working well in this particular example for the higher values of reg_lambda and then it takes advantage of the warm restart. Which brings 2 issues:

should converge better for any individual lambda;
if it doesn’t then warm restarts might biasing results and hiding bugs/issues: we should always compare warm restar vs non-warm to check for biases, we should have an option to switch-off warm-restarts while we are testing/developing pyglmnet, since it might be hiding bugs

May 11 '16 16:05 hugoguh

should converge better for any individual lambda

maybe this could be a test?

if it doesn’t then warm restarts might biasing results and hiding bugs/issues: we should always compare warm restar vs non-warm to check for biases, we should have an option to switch-off warm-restarts while we are testing/developing pyglmnet, since it might be hiding bugs

okay, we can do that too but option should not be in the public API. Maybe in a private function / method ...

May 11 '16 17:05 jasmainak

an alternative to shutting down warm restarts is to just run it for each individual lambda at a time and compare it to when running them together.

I haven’t manage to get positive pseudo-R2s on the V4 neuron for the canonical link function in pyglmnet distr=poissonexp, I have tried several reg_lambdas. Using R with the canonical link I get pseudo-R2 = 0.0383 (+/-0.0059) Using pyglmnet with the (non-canonical) default link function I get 0.0260 (+/-0.0070).

(these values are setting reg_lambda=0)

May 11 '16 18:05 hugoguh

according to the paper, the logic of warm restarts doesn't work if we go from low to high lambda. they recommend fitting the largest lambda first and initializing the next lambda after.

fix: the documentation should include this so that user knows to specify lambda's in decreasing order.

@hugoguh shall i close this issue and create a new issue for documentation related changes?

May 11 '16 19:05 pavanramkumar

still it should converge for a single reg_lambda

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

model = GLM(distr='poissonexp', verbose=False, alpha=0.05, reg_lambda=[0.001])

n_samples, n_features = 10000, 100

# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)

# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)

# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
model.fit(scaler.transform(Xr), yr)

# we'll get .fit_params after .fit(), here we get one set of fit parameters
fit_param = model[-1].fit_

# we can use fitted parameters to predict
yhat = model.predict(scaler.transform(Xt))
print 'reg_lambdas:', model.reg_lambda
print 'pseudo_R2s:', [model.pseudo_R2(yt,i,np.mean(yr)) for i in yhat]

output:

reg_lambdas: [0.001]
pseudo_R2s: [-0.024434697354696278]

May 11 '16 19:05 hugoguh

I don’t think this issue should be closed.

It’s not about documenting the order of the lambdas when using warm starts (though that should be documented).

This issue was about the convergence which still fails in this case. While R doesn’t. So far I have tried pyglmnet on two real datasets and it has failed to converge for both:

failed on V4 (canonical poisson) while R works
failed on Electro imaging data (normal with l1-reg) for which sklearn' lasso worked, it worked better when we changed the learning rate but sill with poor results. needs a better optimization algorithm

May 11 '16 19:05 hugoguh

@hugoguh I agree we should crease out the convergence issues urgently. Did you manage to trace the source of the bug?

May 11 '16 20:05 jasmainak

@hugoguh agreed that we should make the outputs conform to sklearn and R glmnet package for whichever use cases exist in those tools. I appreciate all the testing!

May 11 '16 20:05 pavanramkumar

Let's aim to get to the bottom of the bug.

In your bug report above, I wouldn't call it a convergence issue, so much as a poor choice of reg. parameter. Note that the absolute value of reg. parameter lambda cannot be directly compared for scikit learn and pyglmnet. This is because the cost function (negative log likelihood) is normalized slightly differently. Therefore the relative weight of the penalty term will be different in scikit learn and pyglmnet for the same choice of lambda.

In your bug report above, the test set pseudo-R2 being negative, simply indicates that the model has overfit to the training set. This can be verified if you print both training set (R2r) and test set pseudo-R2s (R2t) for the above example for both scikit learn and pyglmnet.

I would expect to see comparable R2r and R2t for scikit learn but a large positive R2r and negative R2t for pyglmnet

May 11 '16 20:05 pavanramkumar

As for the linear case with Roozbeh's data, we figured out that the fit improves a lot with a better choice of learning rate. We can only get to the bottom of this issue after the learning-rate-free newton-cg solver has been implemented in issue #39. My goal is to solve #39 this weekend

May 11 '16 20:05 pavanramkumar

@pavanramkumar we can compare them directly when setting reg_lambda to zero. R with V4 data and sklearn with Roozbeh’s data worked fine with reg_lambda=0, while pyglmnet didn’t. So I think it is a convergence issue. I agree that we need the adaptive learning rate optimizer, and while it is not implemented (and also when it is) we should use sklearns newton on our cost function to check that the bug is only a optimization issue

May 11 '16 20:05 hugoguh

From here: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html

alpha : float Constant that multiplies the penalty terms. Defaults to 1.0 See the notes for the exact mathematical meaning of this parameter. alpha = 0 is equivalent to an ordinary least square, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised and you should prefer the LinearRegression object.

May 12 '16 02:05 pavanramkumar

Turns out this is mainly an issue of regularization path. See conversation in #76 Should I create new issue for "solution dependence on regularization path"?

May 12 '16 02:05 pavanramkumar

let's just rename this issue so that all the conversation is in the same place?

May 12 '16 08:05 jasmainak