mcp icon indicating copy to clipboard operation
mcp copied to clipboard

Ideas for the far future

Open lindeloev opened this issue 6 years ago • 7 comments

Here are some ideas which could use some discussion and careful consideration. It extends the current model specification: https://lindeloev.github.io/mcp/articles/formulas.html

In the order from "soonish" (top) to "in your dreams":

Survival models

Survival models are relatively simple and we should support them, including censoring too. The API for the model itself would be something like brms:

model = list(
    eventtime | cens(status) ~ 0,
    ~ 0 + x
)

It should support both exponential decay and Cox proportional hazards. This would probably be specified via the mcp(..., family = ) argument, but I'm unsure what would be the best.

Slope on change points

If there are multiple (piecewise) lines over a single change point, and each line is associated with a different parameter x, we can use that to predict the change point. For example, we could assess subitizing in participants of varying age, and it would be reasonable to expect the subitizing range (location of first change point) to increase with age in childhood and decrease with age in adulthood.

How to implement this in a formula, I'm unsure. Maybe it has to be in the random effect: (1 + x | age) since that specifies the grouping of multiple lines. This would also ensure that the parameter names stay intact. cp_i, cp_i_age, and then probably cp_i_x.

Multivariate regression

Multiple response variables predicted from single change points. Something like

model = list(
    c(y1, y2) ~ 1 + x,  # segment 1
    ~ 0  # seg2: joined intercept
)

This could be merged with "Variance change (Specify y)" to specify a change in one response but not the other:

model = list(
    c(y1, y2) ~ 1 + x,  # segment 1
    ~ 0  # seg2: joined intercept on y1 and y2, segment 2
    y1 ~ 1 ~ x  # seg3: slope change in y1 but not y2
)

ARIMA

mcp currently supports AR(N) models. It should go more general. Take a look at:

  • https://www.apress.com/gp/blog/all-blog-posts/simulating-autoregressive-and-moving-average-time-series-in-r/16711624
  • https://github.com/andrewcparnell/jags_examples

lindeloev avatar Oct 13 '19 15:10 lindeloev

I've just come across this package whilst coding up some changepoint models directly. The specific application I've been working in is word/letter frequencies for stylometry. I'm interested to know if categorical/multinomial models could fit into this framework?

(This package looks amazing, by the way!)

josswright avatar Mar 12 '20 11:03 josswright

@josswright Yes, mathematically and in JAGS it's straightforward to extend mcp to include multinomial models, as long as there is a continuous predictor. Just need mcp to count the number of categories in the dependent variable (N), to set up a logistic model for each of them (dbern), and to set up a proper parameterization of the priors on the regression coefficients (probably a dirichlet). The change point will then define a location on x where the N-1 regression coefficients change.

I hope to get time to do this after adding survival analysis, but any pull requests are welcome :-)

lindeloev avatar Mar 12 '20 11:03 lindeloev

@lindeloev Fantastic! I'm more familiar with Stan than JAGS, but if I find the time then I'll definitely take a look!

josswright avatar Mar 18 '20 10:03 josswright

I'm working on abrupt time series forecasting problem and interested in change point/anomaly detection integrated with forecasting algorithms. This package is very promising. Just curious, are there particular reasons to choose JAGS over Stan as the underlying Bayesian software? IMO, stan gains more popularity and extensibility in the community, like "prophet", "tidybayes" and "shinystan". I'm wondering whether it would be considered to include an option to choose "stan" as a backend so we can get a stan model output to integrate with other "stan" based packages.

jpzhangvincent avatar May 17 '20 19:05 jpzhangvincent

@jpzhangvincent I chose JAGS because:

  1. Ease of installation. I have had many students struggle installing RTools. I myself recently tried installing RTools4 without success.
  2. Total speed. Many problems are solved in ~2-5 seconds in JAGS. But will take 30+ seconds because of compilation with stan. I did see that stan was working on some improvements that will bring down compilation time down considerably, though (?),
  3. I'm more familiar with JAGS :-)

If mcp is beginning to get more widely adopted, I would definitely add stan support for the reasons your state! I anticipated this early on so it would not require a major re-write. Only a few hundred of the ~2000 lines of code in mcp are JAGS-specific and the others should be quite sampler-agnostic.

I'd be happy to hear your thoughts here!

lindeloev avatar May 17 '20 21:05 lindeloev

@jpzhangvincent I chose JAGS because:

  1. Ease of installation. I have had many students struggle installing RTools. I myself recently tried installing RTools4 without success.
  2. Total speed. Many problems are solved in ~2-5 seconds in JAGS. But will take 30+ seconds because of compilation with stan. I did see that stan was working on some improvements that will bring down compilation time down considerably, though (?),
  3. I'm more familiar with JAGS :-)

If mcp is beginning to get more widely adopted, I would definitely add stan support for the reasons your state! I anticipated this early on so it would not require a major re-write. Only a few hundred of the ~2000 lines of code in mcp are JAGS-specific and the others should be quite sampler-agnostic.

I'd be happy to hear your thoughts here!

Thanks for your response! I'd like to deep dive into the codebase and see anything I can contribute to. Can you give me some pointers where I can better understand how the model is specified in JAGS?

jpzhangvincent avatar May 18 '20 05:05 jpzhangvincent

@jpzhangvincent Yes, definitely!

fit = mcp(...) returns an mcpfit object and you can see the JAGS code using cat(fit$jags_code).

This code is mostly generated in R/get_jagscode.R and R/get_formula.R using the abstract representation (the "Segment table") which is also saved to fit$.other$ST - one row per segment. ST is not a fixed API but will be in flux as new features are added and old features are refined.

Adding support for stan would be the business of making a get_stancode.R and add a function get_formula_str_stan to the get_formula.R file.

lindeloev avatar May 18 '20 07:05 lindeloev