abess
abess copied to clipboard
[Question] Cox model for ultra-high dimensional data
Hello, I am doing some real data analysis about high-dimensional cox model. My real dataset's shape is like 240*7000, however, I try to use the abess.CoxPHSurvivalAnalysis()
with cv and it can not choose any feature out. So, I must use screening before abess for Cox model. I also did simulation test for only screening method in abess
package and found that the screening method can not contain all the real features spawn by make_glm_data
. So, I doubt the algorithm of screening in this package, I hope you guys may adapt it, thank u!!!
Can you offer a minimal code to reproduce your report? Also, does your results is consistent with this paper: Principled sure independence screening for Cox models with ultra-high-dimensional covariates.
Sorry about that, Here is the simulation code using jupyter notebook. And the performance of screening in abess
package can not be as good as that in the cox-psis paper because the screening method in the paper can almost contain all the true features no matter how many features you want to choose. I append the result picture.
from abess import make_glm_data
from abess import CoxPHSurvivalAnalysis
import numpy as np
sim = make_glm_data(n = 240, p = 7000, k = 20, family = 'cox', rho = 0.5, c = 60)
indice_real = np.array(np.where(sim.coef_ != 0)).reshape(-1)
print(indice_real)
cox = CoxPHSurvivalAnalysis(max_iter = 0,screening_size=1000,support_size=1000)
cox.fit(sim.x,sim.y)
indice_sc = np.array(np.where(cox.coef_ != 0)).reshape(-1)
inter = np.intersect1d(indice_sc,indice_real)
print(inter)