pycox
pycox copied to clipboard
predict survival function
Hi Thanks for developing this package. It is really useful. In the "DeepHitSingle" class, What's the differences between "predict", "predict_surv", and "predict_surv_df"?
Hi! The "predict_surv" and "predict_surv_df" give the same survival predictions, but "predict_surv_df" gives the results as a pandas data frame, while "predict_surv" returns the survival predictions as a numpy array or torch tensor.
The "predict" function only passes the data trough the network meaning it represents different things for different models. In the case of DeepHit, if you use predict, pad the vector with a zero and do the softmax, you will get estimates of the probability mass function https://github.com/havakv/pycox/blob/master/pycox/models/pmf.py#L39
Hi @havakv, can you explain why you pad the vector with zeros before taking the softmax? If I am correct later to get the pmf you then drop the last element of softmax(preds+0), so what you define as pmf would not sum to 1. Is this to account for the possibility of being alive after the time-frame of the training dataset (i.e. right-censored at T_max)?
Hi @ShadiRahimian. You are correct that the the dropped last element represent the probability of survival past the final point in time!
An explanation is given here (deephit is a pmf method) https://link.springer.com/article/10.1007/s10985-021-09532-6#Equ9 in equation (9). The final (implicit) output m+1 represent the probability of survival, while the other represents the probability mass function. It is just a matter of convention that we set the final output (m+1) of the network to be 0, as that gives the probability of survival past the end time given in (10).
An alternative implementation would be to have one extra output of the network without any zero-padding. That will also work perfectly fine. I guess that is the convention in machine learning, while in statistics you fix one of the output to have a unique solution.
You can think of this as implementing a binary classifier using one or two outputs. If you have one output, you use the sigmoid, and if you have two you use the softmax. Here the convention is to have one output, but if you have two, they will have a strict relationship (after the softmax) of out_1 = 1 - out_2
.
Does this make sense?
@havakv Thanks a lot for the answer! As always, clear and complete :)