scikit-survival
scikit-survival copied to clipboard
Generic estimator of survival and cumulative hazard functions
It seems the ensemble methods don't have these methods implemented. How do we generate survival function and cumulative hazard functions for the ensemble based survival methods?
This feature has been requested before and I do plan to add in the next release (probably in February). In the meantime, please have a look at https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368 for alternative ways.
Hi,
Are there any updates on implementing these functions for CoxnetSurvivalAnalysis?
Thanks.
Unfortunately no, I couldn't find the time to implement this yet. Any contributions would appreciated.
Hi, I read through the comments about workarounds for this: specifically "using the predicted risk scores as a feature in a Cox model" to get predicted survival functions.
My question: is it appropriate to use the risk scores as the only feature in a Cox model?
I have tried to implement this and the results look sensible at first glance. Would be grateful if someone could take a look at my code below, and let me know if there is anything fundamentally wrong with this approach. Many thanks.
X = data[feature_cols]
y = data[['censor','time']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)
estimator = GradientBoostingSurvivalAnalysis(subsample=0.95)
estimator.fit(X_train, y_train)
# predicted risk scores - reshaping for next step
y_hat_train = estimator.predict(X_train).reshape(-1, 1)
y_hat_test = estimator.predict(X_test).reshape(-1, 1)
# fit on risk scores from first step ONLY
cox_ph = CoxPHSurvivalAnalysis()
cox_ph.fit(y_hat_train, y_train)
surv_func_train = cox_ph.predict_survival_function(y_hat_train)
surv_func_test = cox_ph.predict_survival_function(y_hat_test)
Looks perfectly fine to me. As mentioned in https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368, the only downside is that this way is still subject to the proportional hazards assumption.
Just curious-- the implementation of survival function and cumulative hazard function for generic survival models that you were hoping to get to: is that "using the predicted risk scores as a feature in a Cox model" or Van Belle et al.'s method as mentioned in #15, or yet some different methods? Thanks.
I was planning on using the approach proposed by Van Belle et al. Unfortunately, due to other obligations, my time is limited at the moment.
At least for GradientBoostingSurvivalAnalysis w/ coxph loss, you could do something similar to what's in lifelines:
def generate_baselines(self, X, y):
X, event, time = check_arrays_survival(X, y, accept_sparse=['csr', 'csc', 'coo'], dtype=DTYPE)
ind_hazards = pd.DataFrame({
"P": np.exp(self.predict(X)),
"E": event
"T": time
})
ind_hazards_summed_over_durations = ind_hazards.groupby("T")[["P", "E"]].sum()
ind_hazards_summed_over_durations["P"] = (
ind_hazards_summed_over_durations["P"].loc[::-1].cumsum()
)
baseline_hazard = ind_hazards_summed_over_durations["E"] / ind_hazards_summed_over_durations["P"]
baseline_cumulative_hazard = baseline_hazard.cumsum()
baseline_survival = np.exp(-baseline_cumulative_hazard)
return baseline_hazard, baseline_cumulative_hazard, baseline_survival
From which survival = baseline_survival.values ** hazard_ratio