scikit-survival
scikit-survival copied to clipboard
Adding Brier-Score Calculation and Calibration Plots (smooth and binned version)
Hi @sebp, This pull requests adds an IPCW-based Brier-Score Calculation to the metrics.py. For details about the calculation cf. the comments in the metrics.py.
Moreover, a modified version of r's calPlot using loess smoothing on jackknife pseudovalues is suggested for visual assessment of model calibration.
For a usage scenario, I modifed your Random Survival Forrest Example. Maybe, you would like to add/modify the post on your blog? TODOS:
- shipping an apache license with it, as the brier score calculation is parly a modified version from pysurival, which comes with Apache 2.0 license. Hence, it may be modified and shared, but the license should be mentioned, shouldn't it?
- changing the used product limit estimator (from lifelines) to the one of sksurv.nonparametric. However, for the binned version of the calibration plot confidence intervals for the Kaplan-Meier estimates are needed.
In case of any questions, feel free to ask! I hope, that I can respond accordingly ;-)
Beste Grüße, Fabian Laqua
Dear @sebp,
thank you for your reply. Yes, we need Unit-tests. However, even though there is a r implementation for the calibration plot, I haven't (need to admit, that I cannot read r ;-) ) translated the r code, but rather implemented, what I found in the cited papers. Unfortunately, I am currently not able to spend much time on it since I am currently full-time involved in clinical routine work (COVID-19...). That is also why I opened the pull request with incompletely tested code. I hoped, that someone (maybe you :-) ) could take a deep look at it and perform the tests.
I had a closer look at your code. The implementation of the (integrated) Brier score seems straight forward. In addition, you implemented calibration_curve
, which according to the references you provided is due to Graw et al. and Gerds et al. I'm not very familiar with these papers, but based on their abstract, they seem to focus on evaluating competing risks models. Currently, scikit-survival doesn't provide any competing risks models, so I wanted to ask how these methods are relevant for evaluating models for right censored data?
Thank you for your effort. I hope, that I can clearify some things.
Yes, their work deals with competing risks, but can be applied to "normal" right-censored survival data with only one event and uninformative censoring. In this implementation we consider a special case of competing risk analysis, where there is just one "competing" event of interest. Then the proposed Aalen-Johnson-Matrix-Estimator simplyfies to a Kaplan-Meier-Product-Limit estimator.
calibration_curve
depicts either a binned calibration curve (pseudovalues = False
), where the predicted survival is categorized in k-tiles (e.g. 10 is a common, but arbitrary choice) and the KM estimate for each of these bins is calculated (ie. actual survival) and plotted against the mean predicted survival in the relative bin. This is the visual analogue of the Hosmer-Lemeshow-Test.
Or (pseudovalues = True
) Jacknife-pseudovalues are calculated in the given manner from KM estimate for the whole population and the subset, where the observation is left out and a (in this case non-parametric loess) smoother is applied to these pseudovalues and the smoothed curve is plotted against the predicted survival.
More on pseudo-observations including their properties in different settings: https://doi.org/10.1177%2F0962280209105020 and in the cited papers by Graw and Gerds et al.
Actually the r function calPlot
in the package <pec>
, this implementation is based on, works very fine with "normal" right censored survival data.
Once competing risks would be implemented (hey, what about a competing risk random survival forest? :-) ) we would ofcourse need to update the code (besides many other things to do) by exchanging the KM-product-limit estimator with the Aalen-Johnson estimator.
I hope I have clearified what it does. Feel free to ask again, if something remains unclear.
I updated the code for brier_score
and integrated_brier_score
and merged it with master.
I haven't looked into calibration_curve
yet, but I'm a little bit hesitant as it does add two additional dependencies: matplotlib and skmisc. In any case, it needs to be rebased onto the latest changes in master and accompanied by unit tests first.
Yes, the matplotlib
dependency could be resolved if we instead returned the points of the calibration curve and let the user plot it and give the matplotlib frame as an example.
There would be an option to use the sklearn.neighbors.KNeighborsRegressor
instead of the nonparametric loess smoother. However,we would need a way to ascertain optimal bandwidth, like in R's pec
package. The beauty of loess is its simplicity with regard to the parameters to be set.