lmtp icon indicating copy to clipboard operation
lmtp copied to clipboard

Formula interface

Open nt-williams opened this issue 3 years ago • 0 comments

A current barrier to use is the required data format. A formulaic interface similar to what already exists in R may help. However, the current data structure exists to guarantee the time ordering of the hypothetical data generating mechanism. A re-imagining of the formula interface would have to be implemented that provides the necessary information to transform a data set from long to the proper wide format. A possible alternative data structure would be:

  id time1 time2 x z trt  y cens
1  1     0     1 1 0   0 NA    1
2  1     1     2 1 5   1 NA    1
3  1     2     3 1 2   0  1    1

time1 would indicate the time at which time-varying covariates and time-varying treatment were observed for a row. time2 would indicate the time the potential outcome could be observed. A formula for this data structure could then be:

Y(y, time1, time2, cens) ~ A(trt) + x + L(z)

Which would be the equivalent of the currently required data structure:

  id x z_0 trt_0 cens_0 z_1 trt_1 cens_1 z_2 trt_2 cens_2 y_3
1  1 1   0     0      1   5     1      1   2     0      1   1

This would require introducing 3 new identifiers: Y(), A(), and L() that would indicate the outcome, exposure, and time-varying covariates, respectively. Variables without an identifier would be considered baseline covariates.

If intermediate outcome variables exist and should be used as time-varying covariates then we would amend the formula as:

Y(L(y), time1, time2, cens) ~ A(trt) + x + L(z)

nt-williams avatar Nov 22 '20 22:11 nt-williams