flexsurv-dev icon indicating copy to clipboard operation
flexsurv-dev copied to clipboard

Splines not supported by standsurv without specifying newdata

Open markdanese opened this issue 2 years ago • 2 comments
trafficstars

I really appreciate this package. It makes things much easier, particularly with regard to generating causal contrasts and getting reasonable variance estimates.

I ran into an issue trying to get standsurv() to work when using a natural spline from the splines package. In this case it was age as a predictor in a model of time to death (in lung cancer). When age was used as a simple continuous variable, standsurv() worked fine without needing to specify the data set. When I changing to a natural spline (ns(age, 2)) to handle some non-linearity in increasing risk with age, I got the error that it could not find the variable age. Helpfully, the error message suggested I should specify "newdata".

I noted that the model object includes the transformed age (i.e., in this case with 2 spline terms), so the error makes sense -- age is not there. And when I specified newdata = the original dataset, it worked without an error.

I am guessing that the predict function (which I think is part of summary()) isn't built for this use case. I tried to see how to work around this and suggest a code change, but I couldn't find anything helpful.

The simple workaround is to explicitly specify the original dataset, so it is not a critical issue. However, I wanted to put this out there in case anyone runs into this.

markdanese avatar Jul 21 '23 22:07 markdanese

Thanks for the report. The default newdata that the flexsurvreg predict method uses is the "model frame" that is created in this line of flexsurvreg.R. When run with a ns() formula, this line seems to put the basis variables into the model frame, rather than the original covariate values that we want. I haven't used ns and the like, so I can't see a quick fix. I will leave this open.

chjackson avatar Jul 26 '23 09:07 chjackson

This is proving tricky to handle. The function stats::get_all_vars seems like it would be useful here, as it is designed to extract the original variables supplied to a formula, whereas stats::model.frame extracts the transformed versions. However get_all_vars fails in cases where the formula contains a data frame look-up, e.g. compare

get_all_vars(ovarian$futime ~ 1, data=NULL) # fails
model.frame(ovarian$futime ~ 1, data=NULL) # works

chjackson avatar Dec 01 '23 17:12 chjackson