lifelines
lifelines copied to clipboard
c-index in bootstraps
I am using a bootstrap approach for model selection. I think it would be great if we can decided how concordance_index treats the ties. Currently, it puts 0.5 weight on the tied pairs. In bootstraps, you have a lot of ties and then the c-index is extremely deteriorated.
For example, the implementation in R allows to decide if removing the tied pairs or count them as concordants:
https://www.rdocumentation.org/packages/pec/versions/2020.11.17/topics/cindex
Hi @dbikiel - seems like a simple idea. To confirm: are you currently using lifelines.utils.concordance.concordance_index
?
Yes, I am using that concordance_index.
To add some more info, because my previous answer may be a bit misleading and most of the problems I am observing may not be related directly to the ties (besides it could be a good idea to add to the package) ...
I did some more experiments. For example, using the rossi dataset I found that if I rely on the c-index obtained directly from the model (the one in print_summary) it does not look good, but if I recalculate the c-index using concordance_index seems to be ok. I am pasting the code below
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter
from lifelines.utils import concordance_index
import numpy as np
import tqdm
import warnings
warnings.filterwarnings('ignore')
def create_bootstrap_data(df, n = 100):
"""
Given a df, create n (default = 100) bootstraps resamples.
returns an array with the indexes (samples x n)
"""
bt_indexes = np.random.choice(df.index, (len(df.index), n), replace = True)
return bt_indexes
def compute_bootstrapped_cindexes(df, duration_col, event_col, formula_model, n):
df2 = df.copy()
df2 = df2.reset_index(drop = True)
indexes = create_bootstrap_data(df = df2, n = n)
index_all = set(df2.index)
cindexes_fitted = []
cindexes_test_oob = []
cindexes_test_optimism = []
cindexes_train_red = []
cindexes_train_bs = []
for i in tqdm.tqdm(range(n)):
# Create a model and fit to the bootstrap
cph = CoxPHFitter()
cph.fit(df = df2.loc[indexes[:,i],:], duration_col = duration_col,
event_col = event_col, formula = formula_model)
#Append the c-index from the fitted model
cindexes_fitted.append(cph.concordance_index_)
#Compute the cindex on different set of samples
index_i = set(indexes[:,i])
index_test = index_all.difference(index_i)
# Test on the out-of-bag samples
pred = -cph.predict_partial_hazard(df2.loc[index_test,:])
cindex_test = concordance_index(df2.loc[index_test,:][duration_col], pred, df2.loc[index_test,:][event_col])
cindexes_test_oob.append(cindex_test)
# Test on the original sample for optimism
pred = -cph.predict_partial_hazard(df2)
cindex_test = concordance_index(df2[duration_col], pred, df2[event_col])
cindexes_test_optimism.append(cindex_test)
# Check the c-index on the "set" of training samples removing duplicates
pred = -cph.predict_partial_hazard(df2.loc[index_i,:])
cindex_train = concordance_index(df2.loc[index_i,:][duration_col], pred, df2.loc[index_i,:][event_col])
cindexes_train_red.append(cindex_train)
# Recompute the c-index on the train sample using concordance_index
pred = -cph.predict_partial_hazard(df2.loc[indexes[:,i],:])
cindex_train = concordance_index(df2.loc[indexes[:,i],:][duration_col], pred, df2.loc[indexes[:,i],:][event_col])
cindexes_train_bs.append(cindex_train)
print('Train c-index (extracted from the model): {:3.3f} +- {:3.3f}'.format(np.mean(cindexes_fitted),np.std(cindexes_fitted)))
print('Train c-index (recalculated using concordance_index): {:3.3f} +- {:3.3f}'.format(np.mean(cindexes_train_bs),np.std(cindexes_train_bs)))
print('Train c-index (computed only on set of train samples): {:3.3f} +- {:3.3f}'.format(np.mean(cindexes_train_red),np.std(cindexes_train_red)))
print('Test c-index (computed on the original data for optimism): {:3.3f} +- {:3.3f}'.format(np.mean(cindexes_test_optimism),np.std(cindexes_test_optimism)))
print('Test c-index (computed only on out-of-bag samples): {:3.3f} +- {:3.3f}'.format(np.mean(cindexes_test_oob),np.std(cindexes_test_oob)))
rossi = load_rossi()
n = 100
duration_col = 'week'
event_col = 'arrest'
formula_model = 'fin + age + race + wexp + mar + paro + prio'
compute_bootstrapped_cindexes(rossi, duration_col, event_col, formula_model, n)
In this particular run, the results are:
Train c-index (extracted from the model): 0.499 +- 0.037 Train c-index (recalculated using concordance_index): 0.655 +- 0.024 Train c-index (computed only on set of train samples): 0.645 +- 0.022 Test c-index (computed on the original data for optimism): 0.631 +- 0.008 Test c-index (computed only on out-of-bag samples): 0.609 +- 0.036
The first value looks awful. Is fundamentally random. This one is the extracted from the summary. The second one is the one using concordance_index in the same samples using to fit the bootstrapped resample. The third line is the same, but removing the duplicated entries of the resample. The fourth line is the c-index calculated on the original sample (which is a mixture of the samples used to fit the model and samples never saw. This is for calculating the optimism). The last line is the c-index on the out-of-bag samples.
At least for me, the only one that doesn't look ok is the first one which comes directly from the fitted model. It may be related on how internally CoxPHFitter handle the dataframe indexes? If you do not ignore warnings, this pops out:
/usr/local/lib/python3.9/site-packages/lifelines/utils/init.py:923: UserWarning: DataFrame Index is not unique, defaulting to incrementing index instead.
EDIT: If I reset the index here:
cph.fit(df = df2.loc[indexes[:,i],:].reset_index(drop = True), duration_col = duration_col,
event_col = event_col, formula = formula_model)
Now everything looks fine. Train c-index (extracted from the model) == Train c-index (recalculated using concordance_index)
So it seems that if the dataframe contains duplicated indexes, something breaks down...