CausalPy icon indicating copy to clipboard operation
CausalPy copied to clipboard

Add new multivariate interrupted time series functionality

Open drbenvincent opened this issue 8 months ago • 11 comments

Proposal

Typically, interrupted time series (ITS) designs are univariate in that there is a single outcome variable. An existing example in the docs examines the causal impact of the onset of covid upon the univariate outcome measure of excess deaths.

However, there are plenty of scenarios where we might want to consider multivariate outcomes. Examples might include:

  • Intervention is an advertising campaign, and we want to measure the causal impact on the sales of multiple products.
  • Intervention is some new educational policy, and we want to measure the causal impact on math, reading, science scores
  • Intervention is a public health campaign, and we want to measure the causal impact on mental health, bmi, diet quality.
  • Intervention is a workplace wellness programme, and we want to measure the causal impact on absenteeism, productivity, job satisfaction.

The simplest approach would be to run multiple ITS models to assess the causal impact of the intervention upon each outcome independently. However, the main limitation of this is that it assumes the outcome measures are independent and it will fail to capture the joint effects. We also have to run multiple analyses rather than just one.

Boundary conditions of when to use this approach

The multivariate outcome approach is particularly useful when it is not appropriate to think about more complex causal structure between the different outcome measures. The example of a marketing intervention and assessing the impact upon the sales of multiple products is perhaps a good one where multivariate ITS could be useful. This would be appropriate if the intervention has widespread impacts upon the sales of multiple products, but there isn't (or we want to remain agnostic) some complex causal chain of influence between multiple products.

However, if the outcomes are less similar, then it might be possible to think up more complex structural models to describe the causal relationships between the various measures. For example, if the intervention is a public health measure.

General approach

The schematic shows roughly what I'm proposing. We have a number of outcome time series and a single point where an intervention takes place (though this could be relaxed). We can use all available current ITS modelling approaches by using the model formula approach. We'd them simply generate a counterfactual prediction for each time series.

Paper Sketches 95

We could either model the variance with individual sigma parameters, or we could use a covariance matrix.

API

We'd stick relatively closely to the current API. The main difference would be that we specify a list of model formulas:

result = cp.pymc_experiments.InterruptedTimeSeries(
    df,
    treatment_time,
    formula=["prod1_sales ~ 1 + t + C(month)",
             "prod2_sales ~ 1 + t + C(month)",
             "prod3_sales ~ 1 + t + C(month)"],
    model=cp.pymc_models.LinearRegression(sample_kwargs={"random_seed": seed}),
)

There would be no requirement to have the RHS of the formulas the same. And we may want to pass in a list of models also.

EDIT: If we did want a single model to be applied to all time series, we could perhaps use formulae. @tomicapretto mentions...

In formulae you can do c(y1, y2, y3) ~ x + z and the outcome will be a matrix with as many columns as names in c(). The c() is like c() in R that stands for concatenate

Extensions

  • Add the ability to have graded treatment. This is likely to be fleshed out in another issue/PR because graded treatment would be a general ability which we could add to multiple quasi experimental contexts.
  • We should also be able to use specific time series models, not just the LinearRegression class. And that could include vector autoregression models where each series is influenced by past values of other series.

I'd be very interested to hear if there is any interest in this kind of functionality, or if you think it doesn't add much.

drbenvincent avatar Jun 17 '24 12:06 drbenvincent