flexsurv-dev
flexsurv-dev copied to clipboard
allow conditional sampling in 'sim.fmsm()' ?
Hi,
this is related to #150 - I am trying to sample forward from the predictive distribution of an individual given current state and sojourn time in that state for a multistate model.
sim.fmsm()
only seems to support sampling conditional on last state but cannot accept the respective sojourn time ('start' paramter in simulate.flexsurvreg()
).
Would this be feasible to add to sim.fmsm()
? I suspect the critical lines are around
https://github.com/chjackson/flexsurv-dev/blob/96dfe2ea0a360d4a79996e3b390d7cd27e0ab265/R/mstate.R#L862
@danielinteractive, you might find this interesting too.
Here is a suggestion (admittedly super inefficient) of how this could look like. One of the many problems is that it is super slow to not preallocate the return vecors. Maybe chunked pre-allocation could work. Ideally, the functionality could be integrated in the sim.fmsm()
function though.
library(flexsurv)
mod_nobos_bos <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==1),
data = bosms3, dist = "weibull")
mod_nobos_death <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==2),
data = bosms3, dist = "weibull")
mod_bos_death <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==3),
data = bosms3, dist = "weibull")
tmat <- rbind(c(NA, 1, 2), c(NA, NA, 3), c(NA, NA, NA))
mdl <- fmsm(mod_nobos_bos, mod_nobos_death, mod_bos_death, trans = tmat)
sim <- function(mdl, newdata = data.frame(..null = NA), start = 0, nsim = 1L) {
as.numeric(simulate(mdl, newdata = newdata, nsim = nsim)[1, 1:nsim])
}
simulate_fmsm <- function(mdl, state, newdata = data.frame(..null = NA),
start = 0, tmax = Inf) {
tmat <- attr(mdl, "trans")
terminal_states <- as.integer(which(rowSums(!is.na(tmat)) == 0))
from <- integer()
to <- integer()
t <- numeric()
tt <- 0
while (!(state %in% terminal_states) & (tt < tmax)) {
# sample next state
from <- c(from, state)
next_states <- as.integer(which(!is.na(tmat[state, ])))
next_trans_ids <- tmat[state, next_states]
# sample from transitions
times <- numeric(length(next_states))
for (i in seq_along(next_trans_ids)) {
trans <- next_trans_ids[i]
times[i] <- sim(mdl[[i]], newdata = newdata, start = start, nsim = 1L)
}
state <- next_states[which.min(times)]
to <- c(to, state)
tt <- tt + min(times)
t <- c(t, tt)
start <- sqrt(.Machine$double.eps) # future states are censored at 0
}
# combine vectors into a data frame and return
res <- cbind(from, to, t)
return(res)
}
simulate_fmsm(mdl, state = 1)
sim.fmsm
simulates the entire pathway through a multistate model though. Do I understand correctly that you want to simulate data where each individual spends a time period of at least "start"
in every one of their states, so that their sojourn times follow standard parametric distributions that are left-truncated by this time? That seems quite specialised. What is the application?
No, only the current state ('start' in sim.fmsm
) sojourn would be needed. Imagine simulating forward for each individual at an interim analysis in a CT setting. For simulate.flexsurvreg()
this is possible via the start
parameter (see #150) but sim.fmsm()
seems to use a slightly different mechanism for sampling (or I missed the point) and I couldn't find a way to specify the time already spent in the starting state. Also, the 'start' paramter has a different meaning in sim.fmsm()
. I think my example code achieves what I want (going directly via simulate.flexsurvreg()
) but to me the sim.fmsm()
function would be the natural place for this functionality.
Edit: also interesting link to #38.