auton-survival icon indicating copy to clipboard operation
auton-survival copied to clipboard

Format of time varying dataset

Open faraya-outra opened this issue 2 years ago • 1 comments

Hello,

I want to use RDSM, but I would like to confirm some things regarding the formatting:

  1. Why does the time variable decreases as you move along the observations for a particular ID? Normally, when I use a survival package time is strictly increasing. For example:
id status2 time drug_D-penicil drug_placebo sex_female
1 1 1.095170 1.0 0.0 1.0
1 1 0.569489 1.0 0.0 1.0
2 0 14.152338 1.0 0.0 1.0
2 0 13.654036 1.0 0.0 1.0
2 0 13.152995 1.0 0.0 1.0
2 0 12.049611 1.0 0.0 1.0
2 0 9.251451 1.0 0.0 1.0
2 0 8.263060 1.0 0.0 1.0
2 0 7.266455 1.0 0.0 1.0
2 0 6.261636 1.0 0.0 1.0
2 0 5.319790 1.0 0.0 1.0
3 1 2.770781 1.0 0.0 0.0
3 1 2.288906 1.0 0.0 0.0
3 1 1.774176 1.0 0.0 0.0
3 1 0.736502 1.0 0.0 0.0
4 1 5.270507 1.0 0.0 1.0
4 1 4.755777 1.0 0.0 1.0
4 1 4.251999 1.0 0.0 1.0
4 1 3.274559 1.0 0.0 1.0
4 1 1.837148 1.0 0.0 1.0
  1. Does each record per ID represent a change in one of the covariates or each record it is just an increment in time regardless whether a covariate changed?

  2. The format you always have to feed the model is a list that contains a separate numpy matrix containing each record per ID?

Thank you :)

Amazing package!

faraya-outra avatar Apr 04 '22 10:04 faraya-outra

I too am struggling to understand the format of x, t, e when using RDSM. Can someone explain exactly how it's to be expected? Also, coming from using Lifelines, for the same "id", only the row with the death event is marked as "not censored" and the previous rows with the same id are marked as "censored". I see with the PBC dataset if the patient died then every row the is marked as "dead" for each t Is that correct?

humana avatar May 06 '23 11:05 humana