scikit-survival icon indicating copy to clipboard operation
scikit-survival copied to clipboard

Adding Brier-Score Calculation and Calibration Plots (smooth and binned version)

Open laqua-stack opened this issue 4 years ago • 5 comments

Hi @sebp, This pull requests adds an IPCW-based Brier-Score Calculation to the metrics.py. For details about the calculation cf. the comments in the metrics.py.

Moreover, a modified version of r's calPlot using loess smoothing on jackknife pseudovalues is suggested for visual assessment of model calibration.

For a usage scenario, I modifed your Random Survival Forrest Example. Maybe, you would like to add/modify the post on your blog? TODOS:

  • shipping an apache license with it, as the brier score calculation is parly a modified version from pysurival, which comes with Apache 2.0 license. Hence, it may be modified and shared, but the license should be mentioned, shouldn't it?
  • changing the used product limit estimator (from lifelines) to the one of sksurv.nonparametric. However, for the binned version of the calibration plot confidence intervals for the Kaplan-Meier estimates are needed.

In case of any questions, feel free to ask! I hope, that I can respond accordingly ;-)

Beste Grüße, Fabian Laqua

laqua-stack avatar Mar 20 '20 01:03 laqua-stack

Dear @sebp,

thank you for your reply. Yes, we need Unit-tests. However, even though there is a r implementation for the calibration plot, I haven't (need to admit, that I cannot read r ;-) ) translated the r code, but rather implemented, what I found in the cited papers. Unfortunately, I am currently not able to spend much time on it since I am currently full-time involved in clinical routine work (COVID-19...). That is also why I opened the pull request with incompletely tested code. I hoped, that someone (maybe you :-) ) could take a deep look at it and perform the tests.

laqua-stack avatar Mar 21 '20 02:03 laqua-stack

I had a closer look at your code. The implementation of the (integrated) Brier score seems straight forward. In addition, you implemented calibration_curve, which according to the references you provided is due to Graw et al. and Gerds et al. I'm not very familiar with these papers, but based on their abstract, they seem to focus on evaluating competing risks models. Currently, scikit-survival doesn't provide any competing risks models, so I wanted to ask how these methods are relevant for evaluating models for right censored data?

sebp avatar May 29 '20 10:05 sebp

Thank you for your effort. I hope, that I can clearify some things.

Yes, their work deals with competing risks, but can be applied to "normal" right-censored survival data with only one event and uninformative censoring. In this implementation we consider a special case of competing risk analysis, where there is just one "competing" event of interest. Then the proposed Aalen-Johnson-Matrix-Estimator simplyfies to a Kaplan-Meier-Product-Limit estimator.

calibration_curve depicts either a binned calibration curve (pseudovalues = False), where the predicted survival is categorized in k-tiles (e.g. 10 is a common, but arbitrary choice) and the KM estimate for each of these bins is calculated (ie. actual survival) and plotted against the mean predicted survival in the relative bin. This is the visual analogue of the Hosmer-Lemeshow-Test.

Or (pseudovalues = True) Jacknife-pseudovalues are calculated in the given manner from KM estimate for the whole population and the subset, where the observation is left out and a (in this case non-parametric loess) smoother is applied to these pseudovalues and the smoothed curve is plotted against the predicted survival.

More on pseudo-observations including their properties in different settings: https://doi.org/10.1177%2F0962280209105020 and in the cited papers by Graw and Gerds et al.

Actually the r function calPlot in the package <pec> , this implementation is based on, works very fine with "normal" right censored survival data.

Once competing risks would be implemented (hey, what about a competing risk random survival forest? :-) ) we would ofcourse need to update the code (besides many other things to do) by exchanging the KM-product-limit estimator with the Aalen-Johnson estimator.

I hope I have clearified what it does. Feel free to ask again, if something remains unclear.

laqua-stack avatar May 30 '20 01:05 laqua-stack

I updated the code for brier_score and integrated_brier_score and merged it with master.

I haven't looked into calibration_curve yet, but I'm a little bit hesitant as it does add two additional dependencies: matplotlib and skmisc. In any case, it needs to be rebased onto the latest changes in master and accompanied by unit tests first.

sebp avatar Jun 05 '20 18:06 sebp

Yes, the matplotlib dependency could be resolved if we instead returned the points of the calibration curve and let the user plot it and give the matplotlib frame as an example.

There would be an option to use the sklearn.neighbors.KNeighborsRegressor instead of the nonparametric loess smoother. However,we would need a way to ascertain optimal bandwidth, like in R's pec package. The beauty of loess is its simplicity with regard to the parameters to be set.

laqua-stack avatar Jun 07 '20 22:06 laqua-stack