rTPC icon indicating copy to clipboard operation
rTPC copied to clipboard

A multitude of models for potential inclusion

Open fwimp opened this issue 1 year ago • 9 comments

A new paper from Georgios Kontopoulos et al. link contains 83 TPC models from a variety of sources. This includes some that are already in rTPC, but many that are not.

This will be frankly quite a big project to incorporate all of these (should that be desired), but it also is definitely feasible for me to work on at least incorporating a subset of them. (I'd like to wait until I see any appropriate feedback on #58 just to make sure I get the process for adding models before actually making an attempt).

As an overview, based on the fitting in the paper these are the top 10 models of their candidate list, so could be a good place to start.

Top 10 models

  • Mitchell-Angilletta (seems to cause issues with the cos() part. Leaving for now)
  • ~~Atkin~~
  • ~~Eubank~~
  • ~~Analytis-Kontodimas~~
  • ~~Taylor-Sexton~~
  • ~~Janisch I~~
  • ~~Simplified B type~~
  • ~~Simplified Briere I~~
  • ~~Warren-Dreyer~~
  • ~~Tomlinson-Phillips~~

Notes

  • It may be easiest to prioritise models that have few parameters and do not rely on complex response-switching behaviours.
  • Also there's definitely an argument to be made for prioritising functions that are the most different in formulation from currently included models

Full table of models

Model Free Parameters Switched? Rank in Paper Working on Location
Analytis-Allahyari 5 47
Analytis-Kontodimas 3 5 Francis fwimp@17c844c, #64
Asbury-Angilletta 6 55
Simplified Asbury-Angilletta 4 52
Ashrafi I 3 27 LIVE
Ashrafi II 3 14 Francis fwimp@f6e5d85, #64
Ashrafi III 3 68 Dan padpadpadpad@9125b77
Ashrafi IV 3 70 Dan padpadpadpad@9125b77
Ashrafi V 4 62 Dan padpadpadpad@9125b77
Atkin 3 2 LIVE
Bilinear 4 1 38
Modified Bilinear 6 1 72
Boatman 5 43 LIVE
Briere I 3 13 Francis fwimp@ccf1ff3, #64
Simplified Briere I 3 9 Francis fwimp@ccf1ff3, #64
Briere II 4 35 LIVE
Simplified Briere II 4 28 Francis fwimp@ccf1ff3, #64
Extended Briere 5 79 Francis fwimp@ccf1ff3, #64
Simplified Extended Briere 5 76 Francis fwimp@ccf1ff3, #64
Cardinal Temperature 4 29 Dan padpadpadpad@e79b181
Cardinal Temperature with Inflection 4 22 LIVE padpadpadpad@111c193
Dent-like 5 1 66
Modified dent-like 7 1 80
Modified Deutsch 4 1 34
Enzyme-assisted Arrhenius 5 48
Eubank 3 3 Francis fwimp@17c844c, #64
Finstad-Jonsson 4 33
Gaussian 3 4 LIVE
Double Gaussian 4 1 78
Modified Gaussian 4 16 LIVE
Exponentially modified Gaussian 5 49
Gaussian-Gompertz 5 44
Hinshelwood 4 82 LIVE
Hobbs 4 45
Huey-Stevenson 5 50
Janisch I 3 7 Francis fwimp@17c844c, #64
Janisch II 4 21 Francis fwimp@17c844c, #64
Johnk 5 40 LIVE
Johnson-Lewin 4 26 LIVE
Extended Johnson-Lewin 5 64
Simplified Johnson-Lewin 4 32
Simplified extended Johnson-Lewin 5 60
Kumaraswamy 5 41
Lactin I 3 56
Lactin II 4 69 LIVE
Linear-logistic 5 51
Logan I 4 42
Logan II 5 46
Logan III 4 81
Mitchell-Angilletta 3 1
Newbery 4 58
O'Neill 4 19 LIVE
Second-order polynomial 3 11 LIVE
Third-order polynomial 4 36
Fourth-order polynomial 5 61
Fifth-order polynomial 6 73
Ratkowsky 4 30 LIVE
Regniere 6 67
Rezende-Bozinovic 4 1 31 LIVE
Rice Clock 6 1 65
Ritchie 4 63
Ross-Ratkowsky 5 39
Ruiz 4 23
Sharpe-Schoolfield 6 54 LIVE
Extended Sharpe-Schoolfield 7 74
Simplified Sharpe-Schoolfield 6 53
Simplified extended Sharpe-Schoolfield 7 75
Simplified B type 3 8 Francis fwimp@17c844c, #64
Skew-normal 4 17
Stevenson 6 83
Stinner 4 1 18 Dan padpadpadpad@a6c7d9c
Taylor-Sexton 3 6 Francis fwimp@17c844c, #64
Thomas I 4 20 LIVE
Thomas II 5 57 LIVE
Thornton-Lessem 6 59
Tomlinson-Menz 4 37
Tomlinson-Phillips 3 12 Francis fwimp@17c844c, #64
Van't Hoff 4 71
Wang-Engel 4 24
Wang-Lan-Ding 7 77
Warren-Dreyer 3 10 Francis fwimp@17c844c, #64
Weibull 4 15 LIVE
Yan-Hunt 4 25

fwimp avatar Dec 12 '24 12:12 fwimp

To add to this, the SI provides the definitions of all functions listed.

Additionally the implementation of these in the original paper give a good clue as to starting values and sensible limits.

fwimp avatar Dec 13 '24 15:12 fwimp

Hi @fwimp. Thanks for starting this.

I had started a similar spreadsheet table here: https://docs.google.com/spreadsheets/d/1YTbt3cemVNIbp_67vRe5s0QyUzkoErBvm7SdIv_5xhw/edit?gid=0#gid=0

What does the switched column mean?

The two Thomas models are definitely in there so have updated your table.

padpadpadpad avatar Dec 15 '24 19:12 padpadpadpad

Switched means that the function has two or more functional forms (so if temp < topt, do x, if temp >= topt, do y).

Just slightly more complex to implement in a vectorised manner, so something that could be a factor in deciding. If that is of no concern we can just strip out said column

fwimp avatar Dec 15 '24 21:12 fwimp

Some helpful functions for use in testing when adding a lot of models at once!

(you'll want to load current versions of rTPC and nls.multstart [i.e. one with lhstype as an option])

get_tpc <- function(tpcname, response){
  tpcfunc <- get(tpcname)
  tpcargs <- names(formals(tpcfunc))
  return(as.formula(paste0(response, "~", paste0(tpcname, "( temp = temp, ", paste(tpcargs[tpcargs != "temp"], collapse = ", "), ")"))))
}
quickfit_tpc <- function(modelname, iter=200, id=1, makegraph=TRUE){
  response = "rate"
  data('chlorella_tpc')
  d <- subset(chlorella_tpc, curve_id == id)
  start_vals <- get_start_vals(d$temp, d$rate, model_name = modelname)
  mod <- suppressWarnings(nls_multstart(get_tpc(modelname, response),
                                      data = d,
                                      iter = iter,
                                      start_lower = start_vals * 0.5,
                                      start_upper = start_vals * 1.5,
                                      lower = get_lower_lims(d$temp, d$rate, model_name = modelname),
                                      upper = get_upper_lims(d$temp, d$rate, model_name = modelname),
                                      supp_errors = 'Y',
                                      convergence_count = FALSE, lhstype = "random"))
  if (makegraph) {
    preds <- broom::augment(mod)
    p <- ggplot(preds) +
    geom_point(aes(temp, rate)) +
    geom_line(aes(temp, .fitted)) +
    labs(title=paste(modelname, "on curve", id))
    theme_bw()
    print(p)
  }
  return(mod)
}
quickfit_dataset <- function(modelnames, iter=200){
  response = "rate"
  data("chlorella_tpc")
  d <- chlorella_tpc
  d_fits <- nest(d, data = c(temp, rate))
  for (model in modelnames) {
    d_fits <- d_fits %>% mutate(!! model := map(data, ~nls_multstart(get_tpc(model, response),
                        data = .x,
                        iter = iter,
                        start_lower = get_start_vals(.x$temp, .x$rate, model_name = model) * 0.5,
                        start_upper = get_start_vals(.x$temp, .x$rate, model_name = model) * 1.5,
                        lower = get_lower_lims(.x$temp, .x$rate, model_name = model),
                        upper = get_upper_lims(.x$temp, .x$rate, model_name = model),
                        supp_errors = 'Y',
                        convergence_count = FALSE)))
  }
  # create new list column of for high resolution data
  d_preds <- mutate(d_fits, new_data = map(data, ~tibble(temp = seq(min(.x$temp), max(.x$temp), length.out = 100)))) %>%
    # get rid of original data column
    select(., -data) %>%
    # stack models into a single column, with an id column for model_name
    pivot_longer(., names_to = 'model_name', values_to = 'fit', contains(modelnames)) %>%
    # create new list column containing the predictions
    # this uses both fit and new_data list columns
    mutate(preds = map2(fit, new_data, ~augment(.x, newdata = .y))) %>%
    # select only the columns we want to keep
    select(curve_id, growth_temp, process, flux, model_name, preds) %>%
    # unlist the preds list column
    unnest(preds)
  
  p <- ggplot(d_preds) +
  geom_line(aes(temp, .fitted, col = model_name)) +
  geom_point(aes(temp, rate), d) +
  facet_wrap(~curve_id, scales = 'free_y', ncol = 6) +
  theme_bw() +
  # theme(legend.position = 'none') +
  scale_color_brewer(type = 'qual', palette = 2) +
  labs(x = 'Temperature (ºC)',
       y = 'Metabolic rate',
       title = 'All fitted thermal performance curves',
       subtitle = paste(modelnames, collapse=", "),
       col="Model")
  
  print(p)
}

used as quickfit_tpc("flextpc_2024", id=3, iter=200) or quickfit_dataset("flextpc_2024", iter=200)

These are pretty useful for quickly checking that fitting is working vaguely sensible. :)

Edit: quickfit_dataset() can now take a vector of model names and fit each one.

fwimp avatar Dec 18 '24 14:12 fwimp

thanks again @fwimp. Need to work out a bit of rlang to understand fully what !! and := are doing, but they definitely feel very powerful.

See you added 8 more models! You are on a festive roll.

padpadpadpad avatar Dec 19 '24 08:12 padpadpadpad

Not a problem! Honestly with the new architecture and tests, adding models has become kind of a breeze. Took a few hours to do the 8 yesterday :)

The process I'm sort of following is:

  1. usethis::use_r("modelname_yyyy") to init the model
  2. usethis::use_test() to init the equivalent test (with the new model open in Rstudio)
  3. Copy over model/test code from previous model/test pair and fix up all the names to be the new model
  4. Redo the model code and starting vals
  5. Test-run using the functions above
  6. When that's basically working, re-write test file
  7. Run test file with the dput(...) line uncommented to get the fit values for the test and insert back in to the test file.
  8. Run test-startingvalues.R to check that all elements of the model are actually implemented (as nls.multfit will often work fine even when they're not working it seems)
  9. Document new model, devtools::document(), and check manually.
  10. R CMD CHECK to make sure everything is up to scratch.

In terms of !!, this is some pretty fun metaprogramming stuff. !! is the injection operator, which substitutes a variable from the current environment as a data-variable. Looking at it today I think you could do the same thing with the .env pronoun, but this is only developer-auxiliary code so I'm not too worried about absolute best practice for it. Whenever you're using things like the rlang and dplyr metaprogramming tools, you have to use the := walrus operator over the standard = for unspecified technical reasons.

I think you could also do this using glue strings as of rlang v0.4.3, which I frankly only just discovered. But in this case any way that gets the job done is good with me!

fwimp avatar Dec 19 '24 10:12 fwimp

I am going to try add a couple of models tomorrow! I see you lurking @mhasoba.

Thanks for all the help @fwimp!

padpadpadpad avatar Dec 19 '24 16:12 padpadpadpad

Not a problem at all!

Just finished all the Brière variants (though honestly I think they might enjoy some starting values massaging as the fits are good but the estimated parameters are sometimes a bit off on higher parameter versions). Gonna go onto the Ashrafi variants and see if I can bash those out before I head off to see the family for christmas.

fwimp avatar Dec 19 '24 16:12 fwimp

Well, looks like we've now doubled the number of included models from 24 to 48! That's pretty cool :)

I'm not sure how essential the remaining ones are, but always worth chipping away at if there's anything that looks interesting.

I'm going to have a stab at #65 so that if people want other models from here for their own work, then contributing those upstream here should be easy.

fwimp avatar Jan 14 '25 12:01 fwimp