lifelines
lifelines copied to clipboard
include median confidence interval in estimates?
Hello,
Would it make sense to include median confidence intervals in the estimates? For example, in the object returned from KaplanMeierFitter()
, after fitting, one could do:
# get the confidence interval of the median
# start by finding the median on the timeline
median_ci_idx = kmf.confidence_interval_.index.get_loc(median_lifespan,
method='nearest')
# then get the confidence interval at the point of the median
median_ci = kmf.confidence_interval_.iloc[median_ci_idx]
# to find the confidence interval for the value of the median
# we will get a horizontal line passing through the value of the
# median; note that the greater value comes earlier, and the smaller later
lcl = (kmf.survival_function_['KM_estimate']
<= median_ci.loc['KM_estimate_upper_0.95']).idxmax(0)
ucl = (kmf.survival_function_['KM_estimate']
<= median_ci.loc['KM_estimate_lower_0.95']).idxmax(0)
This is what is essentially done by R, as explained in print.survfit.
There is a util function in lifelines to make this easy:
from lifelines.utils import median_survival_times
print median_survival_times(kmf.confidence_interval_)
# dataframe
"""
0.5
KM_estimate_upper_0.95 530.0
KM_estimate_lower_0.95 468.0
"""
What's I think more interesting is add a __repr__
function to a fitted KaplanMeierFitter that this could be expressed in.
Thanks!
In fact this is a different interpretation of the median confidence interval than the one used in print.survfit. In lifelines we get the the median of the lower and the upper confidence interval functions; in print.survfit we get the median of the KM function and then find the intersection of the horizontal like with the two confidence interval functions.
I am not a statistician so I do not know which of the two most people would mean when they talk about confidence intervals of the median.
median_survival_times
doesn't work like that anymore in the current release. Was this functionality moved somewhere else?
@robertcv what's not working? Can you provide an example or code snippet?
In the previous comment you point to median_survival_times
as a method of getting the 95% confidence interval for the median survival time. In the current version of the package, this function only returns the median survival time and does not include the interval. I fixed this by implementing the confidence interval myself. It worked similar to what OP described.
I'm not sure I'm seeing the same thing as you:
from lifelines.utils import median_survival_times
kmf = KaplanMeierFitter().fit(np.random.exponential(2, size=50000))
print(median_survival_times(kmf.confidence_interval_))
returns two values, 1.383272 & 1.418151
I am sorry, now I see why it didn't work for me. The function behaves differently depending on what you have on the input. I used median_survival_times(kmf)
instead of median_survival_times(kmf.confidence_interval_)
and only got the median time. I would suggest updating the documentation on this function. The parameters description uses "or" and I thought that it makes no difference if I use the model class or the confidence interval DataFrame.