scikit-survival icon indicating copy to clipboard operation
scikit-survival copied to clipboard

Generic estimator of survival and cumulative hazard functions

Open coltsfan opened this issue 7 years ago • 8 comments

It seems the ensemble methods don't have these methods implemented. How do we generate survival function and cumulative hazard functions for the ensemble based survival methods?

coltsfan avatar Jan 14 '18 16:01 coltsfan

This feature has been requested before and I do plan to add in the next release (probably in February). In the meantime, please have a look at https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368 for alternative ways.

sebp avatar Jan 18 '18 20:01 sebp

Hi,

Are there any updates on implementing these functions for CoxnetSurvivalAnalysis?

Thanks.

mzhao94 avatar May 10 '18 00:05 mzhao94

Unfortunately no, I couldn't find the time to implement this yet. Any contributions would appreciated.

sebp avatar May 19 '18 09:05 sebp

Hi, I read through the comments about workarounds for this: specifically "using the predicted risk scores as a feature in a Cox model" to get predicted survival functions.

My question: is it appropriate to use the risk scores as the only feature in a Cox model?

I have tried to implement this and the results look sensible at first glance. Would be grateful if someone could take a look at my code below, and let me know if there is anything fundamentally wrong with this approach. Many thanks.

X = data[feature_cols]
y = data[['censor','time']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

estimator = GradientBoostingSurvivalAnalysis(subsample=0.95)
estimator.fit(X_train, y_train)

# predicted risk scores - reshaping for next step
y_hat_train = estimator.predict(X_train).reshape(-1, 1)
y_hat_test = estimator.predict(X_test).reshape(-1, 1)

# fit on risk scores from first step ONLY
cox_ph = CoxPHSurvivalAnalysis()
cox_ph.fit(y_hat_train, y_train)

surv_func_train = cox_ph.predict_survival_function(y_hat_train)
surv_func_test = cox_ph.predict_survival_function(y_hat_test)

tob789 avatar May 29 '18 20:05 tob789

Looks perfectly fine to me. As mentioned in https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368, the only downside is that this way is still subject to the proportional hazards assumption.

sebp avatar May 30 '18 20:05 sebp

Just curious-- the implementation of survival function and cumulative hazard function for generic survival models that you were hoping to get to: is that "using the predicted risk scores as a feature in a Cox model" or Van Belle et al.'s method as mentioned in #15, or yet some different methods? Thanks.

leihuang avatar Sep 26 '18 17:09 leihuang

I was planning on using the approach proposed by Van Belle et al. Unfortunately, due to other obligations, my time is limited at the moment.

sebp avatar Sep 27 '18 17:09 sebp

At least for GradientBoostingSurvivalAnalysis w/ coxph loss, you could do something similar to what's in lifelines:

def generate_baselines(self, X, y):
    X, event, time = check_arrays_survival(X, y, accept_sparse=['csr', 'csc', 'coo'], dtype=DTYPE)
    
    ind_hazards = pd.DataFrame({
        "P": np.exp(self.predict(X)),
        "E": event
        "T": time
    })

    ind_hazards_summed_over_durations = ind_hazards.groupby("T")[["P", "E"]].sum()
    ind_hazards_summed_over_durations["P"] = (
        ind_hazards_summed_over_durations["P"].loc[::-1].cumsum()
    )

    baseline_hazard = ind_hazards_summed_over_durations["E"] / ind_hazards_summed_over_durations["P"]

    baseline_cumulative_hazard = baseline_hazard.cumsum()

    baseline_survival = np.exp(-baseline_cumulative_hazard)
    
    return baseline_hazard, baseline_cumulative_hazard, baseline_survival

From which survival = baseline_survival.values ** hazard_ratio

rfdearborn avatar Sep 05 '19 02:09 rfdearborn