abess icon indicating copy to clipboard operation
abess copied to clipboard

[Question] Cox model for ultra-high dimensional data

Open EQUIWDH opened this issue 2 years ago • 2 comments

Hello, I am doing some real data analysis about high-dimensional cox model. My real dataset's shape is like 240*7000, however, I try to use the abess.CoxPHSurvivalAnalysis() with cv and it can not choose any feature out. So, I must use screening before abess for Cox model. I also did simulation test for only screening method in abess package and found that the screening method can not contain all the real features spawn by make_glm_data. So, I doubt the algorithm of screening in this package, I hope you guys may adapt it, thank u!!!

EQUIWDH avatar Nov 15 '22 08:11 EQUIWDH

Can you offer a minimal code to reproduce your report? Also, does your results is consistent with this paper: Principled sure independence screening for Cox models with ultra-high-dimensional covariates.

Mamba413 avatar Nov 15 '22 15:11 Mamba413

Sorry about that, Here is the simulation code using jupyter notebook. And the performance of screening in abess package can not be as good as that in the cox-psis paper because the screening method in the paper can almost contain all the true features no matter how many features you want to choose. I append the result picture.

from abess import make_glm_data
from abess import CoxPHSurvivalAnalysis 
import numpy as np

sim = make_glm_data(n = 240, p = 7000, k = 20, family = 'cox', rho = 0.5, c = 60)
indice_real = np.array(np.where(sim.coef_ != 0)).reshape(-1)
print(indice_real)

cox = CoxPHSurvivalAnalysis(max_iter = 0,screening_size=1000,support_size=1000)
cox.fit(sim.x,sim.y)
indice_sc = np.array(np.where(cox.coef_ != 0)).reshape(-1)

inter = np.intersect1d(indice_sc,indice_real)
print(inter)

choose

EQUIWDH avatar Nov 16 '22 05:11 EQUIWDH