scikit-survival Generic estimator of survival and cumulative hazard functions

It seems the ensemble methods don't have these methods implemented. How do we generate survival function and cumulative hazard functions for the ensemble based survival methods?

Jan 14 '18 16:01 coltsfan

This feature has been requested before and I do plan to add in the next release (probably in February). In the meantime, please have a look at https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368 for alternative ways.

Jan 18 '18 20:01 sebp

Hi,

Are there any updates on implementing these functions for CoxnetSurvivalAnalysis?

Thanks.

May 10 '18 00:05 mzhao94

Unfortunately no, I couldn't find the time to implement this yet. Any contributions would appreciated.

May 19 '18 09:05 sebp

Hi, I read through the comments about workarounds for this: specifically "using the predicted risk scores as a feature in a Cox model" to get predicted survival functions.

My question: is it appropriate to use the risk scores as the only feature in a Cox model?

I have tried to implement this and the results look sensible at first glance. Would be grateful if someone could take a look at my code below, and let me know if there is anything fundamentally wrong with this approach. Many thanks.

X = data[feature_cols]
y = data[['censor','time']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

estimator = GradientBoostingSurvivalAnalysis(subsample=0.95)
estimator.fit(X_train, y_train)

# predicted risk scores - reshaping for next step
y_hat_train = estimator.predict(X_train).reshape(-1, 1)
y_hat_test = estimator.predict(X_test).reshape(-1, 1)

# fit on risk scores from first step ONLY
cox_ph = CoxPHSurvivalAnalysis()
cox_ph.fit(y_hat_train, y_train)

surv_func_train = cox_ph.predict_survival_function(y_hat_train)
surv_func_test = cox_ph.predict_survival_function(y_hat_test)

May 29 '18 20:05 tob789

Looks perfectly fine to me. As mentioned in https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368, the only downside is that this way is still subject to the proportional hazards assumption.

May 30 '18 20:05 sebp

Just curious-- the implementation of survival function and cumulative hazard function for generic survival models that you were hoping to get to: is that "using the predicted risk scores as a feature in a Cox model" or Van Belle et al.'s method as mentioned in #15, or yet some different methods? Thanks.

Sep 26 '18 17:09 leihuang

I was planning on using the approach proposed by Van Belle et al. Unfortunately, due to other obligations, my time is limited at the moment.

Sep 27 '18 17:09 sebp

At least for GradientBoostingSurvivalAnalysis w/ coxph loss, you could do something similar to what's in lifelines:

def generate_baselines(self, X, y):
    X, event, time = check_arrays_survival(X, y, accept_sparse=['csr', 'csc', 'coo'], dtype=DTYPE)
    
    ind_hazards = pd.DataFrame({
        "P": np.exp(self.predict(X)),
        "E": event
        "T": time
    })

    ind_hazards_summed_over_durations = ind_hazards.groupby("T")[["P", "E"]].sum()
    ind_hazards_summed_over_durations["P"] = (
        ind_hazards_summed_over_durations["P"].loc[::-1].cumsum()
    )

    baseline_hazard = ind_hazards_summed_over_durations["E"] / ind_hazards_summed_over_durations["P"]

    baseline_cumulative_hazard = baseline_hazard.cumsum()

    baseline_survival = np.exp(-baseline_cumulative_hazard)
    
    return baseline_hazard, baseline_cumulative_hazard, baseline_survival

From which survival = baseline_survival.values ** hazard_ratio

Sep 05 '19 02:09 rfdearborn

scikit-survival scikit-survival copied to clipboard

Generic estimator of survival and cumulative hazard functions

scikit-survival
scikit-survival copied to clipboard