pyglmnet icon indicating copy to clipboard operation
pyglmnet copied to clipboard

larger training set deviance for smaller values of reg_lambda: bug in convergence criterion?

Open hugoguh opened this issue 8 years ago • 4 comments

I noticed it while working on cvpyglmnet:

screen shot 2016-05-09 at 2 01 10 pm

If you notice there is a weird behavior on the training performance: the training set deviance is supposed to always go down as reg_lambda approaches zero (or log(Lambda) becomes more negative).

Here’s some code for you to replicate it: doesn’t happen always, for instance try using np.random.seed(0)

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
import scikits.bootstrap as boot
from pyglmnet import GLM
np.random.seed(42)

# create an instance of the GLM class
reg_lambda = np.exp(np.linspace(-10, -3, num=100))
model = GLM(distr='poisson', verbose=False, alpha=0.95, reg_lambda=reg_lambda)

n_samples, n_features = 10000, 100

# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)

# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)

# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)

fit and compute deviance using average deviance, just a scaler diference but better measure when cross-validating because different folds might have different number of elements, use the commented line for deviance

yrhat = model.fit_predict(scaler.transform(Xr), yr)
#dev_t = [model.deviance(yr, i) for i in yrhat]
dev_t = [model.deviance(yr, i)/float(np.shape(yrhat)[1]) for i in yrhat]

now plot

%matplotlib inline
import matplotlib.pyplot as plt
upto = 60
plt.plot(np.log(model.reg_lambda[0:upto]), dev_t[0:upto], '-o', c='k')
plt.xlabel('log(Lambda)')
plt.ylabel('Poisson Deviance')

Here’s the output: screen shot 2016-05-09 at 1 48 14 pm

play around with the range of reg_lambdas and also try with other random.seed (different simulated Xr and yr).

Any idea on where it is coming from? At first I thought it was a warm start effect but @pavanramkumar says it starts by fitting the larger reg_lambdas. This might give a hint, it seems to not depend on the actual value of reg_lambda: when it happens it seems to always happen towards the end, see if I use a slightly different range of reg_lambdas when instantiating the model:

reg_lambda = np.exp(np.linspace(-8, -3, num=100))

screen shot 2016-05-09 at 2 03 17 pm

hugoguh avatar May 09 '16 19:05 hugoguh