lmtp
lmtp copied to clipboard
Formula interface
A current barrier to use is the required data format. A formulaic interface similar to what already exists in R may help. However, the current data structure exists to guarantee the time ordering of the hypothetical data generating mechanism. A re-imagining of the formula interface would have to be implemented that provides the necessary information to transform a data set from long to the proper wide format. A possible alternative data structure would be:
id time1 time2 x z trt y cens
1 1 0 1 1 0 0 NA 1
2 1 1 2 1 5 1 NA 1
3 1 2 3 1 2 0 1 1
time1
would indicate the time at which time-varying covariates and time-varying treatment were observed for a row. time2
would indicate the time the potential outcome could be observed. A formula for this data structure could then be:
Y(y, time1, time2, cens) ~ A(trt) + x + L(z)
Which would be the equivalent of the currently required data structure:
id x z_0 trt_0 cens_0 z_1 trt_1 cens_1 z_2 trt_2 cens_2 y_3
1 1 1 0 0 1 5 1 1 2 0 1 1
This would require introducing 3 new identifiers: Y()
, A()
, and L()
that would indicate the outcome, exposure, and time-varying covariates, respectively. Variables without an identifier would be considered baseline covariates.
If intermediate outcome variables exist and should be used as time-varying covariates then we would amend the formula as:
Y(L(y), time1, time2, cens) ~ A(trt) + x + L(z)