lifelines icon indicating copy to clipboard operation
lifelines copied to clipboard

include median confidence interval in estimates?

Open louridas opened this issue 8 years ago • 7 comments

Hello,

Would it make sense to include median confidence intervals in the estimates? For example, in the object returned from KaplanMeierFitter(), after fitting, one could do:

# get the confidence interval of the median
# start by finding the median on the timeline
median_ci_idx = kmf.confidence_interval_.index.get_loc(median_lifespan,
                                                       method='nearest')
# then get the confidence interval at the point of the median
median_ci = kmf.confidence_interval_.iloc[median_ci_idx]
# to find the confidence interval for the value of the median
# we will get a horizontal line passing through the value of the
# median; note that the greater value comes earlier, and the smaller later
lcl = (kmf.survival_function_['KM_estimate']
       <= median_ci.loc['KM_estimate_upper_0.95']).idxmax(0)    
ucl = (kmf.survival_function_['KM_estimate']
       <= median_ci.loc['KM_estimate_lower_0.95']).idxmax(0)

This is what is essentially done by R, as explained in print.survfit.

louridas avatar Feb 17 '17 12:02 louridas

There is a util function in lifelines to make this easy:

from lifelines.utils import median_survival_times
print median_survival_times(kmf.confidence_interval_)
# dataframe
"""
                          0.5
KM_estimate_upper_0.95  530.0
KM_estimate_lower_0.95  468.0
"""

What's I think more interesting is add a __repr__ function to a fitted KaplanMeierFitter that this could be expressed in.

CamDavidsonPilon avatar Feb 19 '17 21:02 CamDavidsonPilon

Thanks!

In fact this is a different interpretation of the median confidence interval than the one used in print.survfit. In lifelines we get the the median of the lower and the upper confidence interval functions; in print.survfit we get the median of the KM function and then find the intersection of the horizontal like with the two confidence interval functions.

I am not a statistician so I do not know which of the two most people would mean when they talk about confidence intervals of the median.

louridas avatar Feb 25 '17 15:02 louridas

median_survival_times doesn't work like that anymore in the current release. Was this functionality moved somewhere else?

robertcv avatar Mar 10 '21 16:03 robertcv

@robertcv what's not working? Can you provide an example or code snippet?

CamDavidsonPilon avatar Apr 06 '21 15:04 CamDavidsonPilon

In the previous comment you point to median_survival_times as a method of getting the 95% confidence interval for the median survival time. In the current version of the package, this function only returns the median survival time and does not include the interval. I fixed this by implementing the confidence interval myself. It worked similar to what OP described.

robertcv avatar Apr 06 '21 15:04 robertcv

I'm not sure I'm seeing the same thing as you:

from lifelines.utils import median_survival_times
kmf = KaplanMeierFitter().fit(np.random.exponential(2, size=50000))
print(median_survival_times(kmf.confidence_interval_))

returns two values, 1.383272 & 1.418151

CamDavidsonPilon avatar Apr 06 '21 15:04 CamDavidsonPilon

I am sorry, now I see why it didn't work for me. The function behaves differently depending on what you have on the input. I used median_survival_times(kmf) instead of median_survival_times(kmf.confidence_interval_) and only got the median time. I would suggest updating the documentation on this function. The parameters description uses "or" and I thought that it makes no difference if I use the model class or the confidence interval DataFrame.

robertcv avatar Apr 06 '21 16:04 robertcv