pycox icon indicating copy to clipboard operation
pycox copied to clipboard

predict survival function

Open liguowang opened this issue 3 years ago • 4 comments

Hi Thanks for developing this package. It is really useful. In the "DeepHitSingle" class, What's the differences between "predict", "predict_surv", and "predict_surv_df"?

liguowang avatar Jan 02 '21 02:01 liguowang

Hi! The "predict_surv" and "predict_surv_df" give the same survival predictions, but "predict_surv_df" gives the results as a pandas data frame, while "predict_surv" returns the survival predictions as a numpy array or torch tensor.

The "predict" function only passes the data trough the network meaning it represents different things for different models. In the case of DeepHit, if you use predict, pad the vector with a zero and do the softmax, you will get estimates of the probability mass function https://github.com/havakv/pycox/blob/master/pycox/models/pmf.py#L39

havakv avatar Jan 02 '21 15:01 havakv

Hi @havakv, can you explain why you pad the vector with zeros before taking the softmax? If I am correct later to get the pmf you then drop the last element of softmax(preds+0), so what you define as pmf would not sum to 1. Is this to account for the possibility of being alive after the time-frame of the training dataset (i.e. right-censored at T_max)?

ShadiRahimian avatar Nov 28 '21 13:11 ShadiRahimian

Hi @ShadiRahimian. You are correct that the the dropped last element represent the probability of survival past the final point in time!

An explanation is given here (deephit is a pmf method) https://link.springer.com/article/10.1007/s10985-021-09532-6#Equ9 in equation (9). The final (implicit) output m+1 represent the probability of survival, while the other represents the probability mass function. It is just a matter of convention that we set the final output (m+1) of the network to be 0, as that gives the probability of survival past the end time given in (10).

An alternative implementation would be to have one extra output of the network without any zero-padding. That will also work perfectly fine. I guess that is the convention in machine learning, while in statistics you fix one of the output to have a unique solution.

You can think of this as implementing a binary classifier using one or two outputs. If you have one output, you use the sigmoid, and if you have two you use the softmax. Here the convention is to have one output, but if you have two, they will have a strict relationship (after the softmax) of out_1 = 1 - out_2.

Does this make sense?

havakv avatar Dec 03 '21 07:12 havakv

@havakv Thanks a lot for the answer! As always, clear and complete :)

ShadiRahimian avatar Dec 06 '21 09:12 ShadiRahimian