pyglmnet
pyglmnet copied to clipboard
larger training set deviance for smaller values of reg_lambda: bug in convergence criterion?
I noticed it while working on cvpyglmnet:
If you notice there is a weird behavior on the training performance: the training set deviance is supposed to always go down as reg_lambda
approaches zero (or log(Lambda) becomes more negative).
Here’s some code for you to replicate it:
doesn’t happen always, for instance try using np.random.seed(0)
import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
import scikits.bootstrap as boot
from pyglmnet import GLM
np.random.seed(42)
# create an instance of the GLM class
reg_lambda = np.exp(np.linspace(-10, -3, num=100))
model = GLM(distr='poisson', verbose=False, alpha=0.95, reg_lambda=reg_lambda)
n_samples, n_features = 10000, 100
# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())
# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)
# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)
# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
fit and compute deviance using average deviance, just a scaler diference but better measure when cross-validating because different folds might have different number of elements, use the commented line for deviance
yrhat = model.fit_predict(scaler.transform(Xr), yr)
#dev_t = [model.deviance(yr, i) for i in yrhat]
dev_t = [model.deviance(yr, i)/float(np.shape(yrhat)[1]) for i in yrhat]
now plot
%matplotlib inline
import matplotlib.pyplot as plt
upto = 60
plt.plot(np.log(model.reg_lambda[0:upto]), dev_t[0:upto], '-o', c='k')
plt.xlabel('log(Lambda)')
plt.ylabel('Poisson Deviance')
Here’s the output:
play around with the range of reg_lambdas
and also try with other random.seed
(different simulated Xr
and yr
).
Any idea on where it is coming from? At first I thought it was a warm start effect but @pavanramkumar says it starts by fitting the larger reg_lambdas
.
This might give a hint, it seems to not depend on the actual value of reg_lambda
: when it happens it seems to always happen towards the end, see if I use a slightly different range of reg_lambdas
when instantiating the model:
reg_lambda = np.exp(np.linspace(-8, -3, num=100))