glmnet-python Segfault when training elastic net / lasso with wide problems

When the number of features is much bigger than the number of samples I get a segmentation fault. The following script can reproduce the problem:

import numpy as np
from glmnet.elastic_net import Lasso

# problem dim
n_samples = 100
n_features = 100000
n_informative_features = 10

# normally distributed input signal
X = np.random.randn(n_samples, n_features)

# generate a ground truth model with only the first 10 features being non
# zeros (the other features are not correlated to Y and should be ignored by
# the L1 regularizer)
true_coef = np.zeros(n_features)
true_coef[:n_informative_features] = np.random.randn(n_informative_features)

# generate the ground truth Y from the reference model and X + label noise
Y = np.dot(X, true_coef) + np.random.normal(scale=0.1, size=n_samples)

print Lasso(alpha=1).fit(X, Y)

Jul 20 '10 12:07 ogrisel

Can you try with the R glmnet package? It might just be the Fortran code...

Jul 22 '10 03:07 dwf

I am not familiar with R but I will give it a shot as soon as I receive the R in a Nutshell book from amazon.

Jul 22 '10 07:07 ogrisel