glmnet-python
glmnet-python copied to clipboard
Segfault when training elastic net / lasso with wide problems
When the number of features is much bigger than the number of samples I get a segmentation fault. The following script can reproduce the problem:
import numpy as np from glmnet.elastic_net import Lasso # problem dim n_samples = 100 n_features = 100000 n_informative_features = 10 # normally distributed input signal X = np.random.randn(n_samples, n_features) # generate a ground truth model with only the first 10 features being non # zeros (the other features are not correlated to Y and should be ignored by # the L1 regularizer) true_coef = np.zeros(n_features) true_coef[:n_informative_features] = np.random.randn(n_informative_features) # generate the ground truth Y from the reference model and X + label noise Y = np.dot(X, true_coef) + np.random.normal(scale=0.1, size=n_samples) print Lasso(alpha=1).fit(X, Y)
Can you try with the R glmnet package? It might just be the Fortran code...
I am not familiar with R but I will give it a shot as soon as I receive the R in a Nutshell book from amazon.