A multitude of models for potential inclusion
A new paper from Georgios Kontopoulos et al. link contains 83 TPC models from a variety of sources. This includes some that are already in rTPC, but many that are not.
This will be frankly quite a big project to incorporate all of these (should that be desired), but it also is definitely feasible for me to work on at least incorporating a subset of them. (I'd like to wait until I see any appropriate feedback on #58 just to make sure I get the process for adding models before actually making an attempt).
As an overview, based on the fitting in the paper these are the top 10 models of their candidate list, so could be a good place to start.
Top 10 models
- Mitchell-Angilletta (seems to cause issues with the
cos()part. Leaving for now) - ~~Atkin~~
- ~~Eubank~~
- ~~Analytis-Kontodimas~~
- ~~Taylor-Sexton~~
- ~~Janisch I~~
- ~~Simplified B type~~
- ~~Simplified Briere I~~
- ~~Warren-Dreyer~~
- ~~Tomlinson-Phillips~~
Notes
- It may be easiest to prioritise models that have few parameters and do not rely on complex response-switching behaviours.
- Also there's definitely an argument to be made for prioritising functions that are the most different in formulation from currently included models
Full table of models
| Model | Free Parameters | Switched? | Rank in Paper | Working on | Location |
|---|---|---|---|---|---|
| Analytis-Allahyari | 5 | 47 | |||
| Analytis-Kontodimas | 3 | 5 | Francis | fwimp@17c844c, #64 | |
| Asbury-Angilletta | 6 | 55 | |||
| Simplified Asbury-Angilletta | 4 | 52 | |||
| Ashrafi I | 3 | 27 | LIVE | ||
| Ashrafi II | 3 | 14 | Francis | fwimp@f6e5d85, #64 | |
| Ashrafi III | 3 | 68 | Dan | padpadpadpad@9125b77 | |
| Ashrafi IV | 3 | 70 | Dan | padpadpadpad@9125b77 | |
| Ashrafi V | 4 | 62 | Dan | padpadpadpad@9125b77 | |
| Atkin | 3 | 2 | LIVE | ||
| Bilinear | 4 | 1 | 38 | ||
| Modified Bilinear | 6 | 1 | 72 | ||
| Boatman | 5 | 43 | LIVE | ||
| Briere I | 3 | 13 | Francis | fwimp@ccf1ff3, #64 | |
| Simplified Briere I | 3 | 9 | Francis | fwimp@ccf1ff3, #64 | |
| Briere II | 4 | 35 | LIVE | ||
| Simplified Briere II | 4 | 28 | Francis | fwimp@ccf1ff3, #64 | |
| Extended Briere | 5 | 79 | Francis | fwimp@ccf1ff3, #64 | |
| Simplified Extended Briere | 5 | 76 | Francis | fwimp@ccf1ff3, #64 | |
| Cardinal Temperature | 4 | 29 | Dan | padpadpadpad@e79b181 | |
| Cardinal Temperature with Inflection | 4 | 22 | LIVE | padpadpadpad@111c193 | |
| Dent-like | 5 | 1 | 66 | ||
| Modified dent-like | 7 | 1 | 80 | ||
| Modified Deutsch | 4 | 1 | 34 | ||
| Enzyme-assisted Arrhenius | 5 | 48 | |||
| Eubank | 3 | 3 | Francis | fwimp@17c844c, #64 | |
| Finstad-Jonsson | 4 | 33 | |||
| Gaussian | 3 | 4 | LIVE | ||
| Double Gaussian | 4 | 1 | 78 | ||
| Modified Gaussian | 4 | 16 | LIVE | ||
| Exponentially modified Gaussian | 5 | 49 | |||
| Gaussian-Gompertz | 5 | 44 | |||
| Hinshelwood | 4 | 82 | LIVE | ||
| Hobbs | 4 | 45 | |||
| Huey-Stevenson | 5 | 50 | |||
| Janisch I | 3 | 7 | Francis | fwimp@17c844c, #64 | |
| Janisch II | 4 | 21 | Francis | fwimp@17c844c, #64 | |
| Johnk | 5 | 40 | LIVE | ||
| Johnson-Lewin | 4 | 26 | LIVE | ||
| Extended Johnson-Lewin | 5 | 64 | |||
| Simplified Johnson-Lewin | 4 | 32 | |||
| Simplified extended Johnson-Lewin | 5 | 60 | |||
| Kumaraswamy | 5 | 41 | |||
| Lactin I | 3 | 56 | |||
| Lactin II | 4 | 69 | LIVE | ||
| Linear-logistic | 5 | 51 | |||
| Logan I | 4 | 42 | |||
| Logan II | 5 | 46 | |||
| Logan III | 4 | 81 | |||
| Mitchell-Angilletta | 3 | 1 | |||
| Newbery | 4 | 58 | |||
| O'Neill | 4 | 19 | LIVE | ||
| Second-order polynomial | 3 | 11 | LIVE | ||
| Third-order polynomial | 4 | 36 | |||
| Fourth-order polynomial | 5 | 61 | |||
| Fifth-order polynomial | 6 | 73 | |||
| Ratkowsky | 4 | 30 | LIVE | ||
| Regniere | 6 | 67 | |||
| Rezende-Bozinovic | 4 | 1 | 31 | LIVE | |
| Rice Clock | 6 | 1 | 65 | ||
| Ritchie | 4 | 63 | |||
| Ross-Ratkowsky | 5 | 39 | |||
| Ruiz | 4 | 23 | |||
| Sharpe-Schoolfield | 6 | 54 | LIVE | ||
| Extended Sharpe-Schoolfield | 7 | 74 | |||
| Simplified Sharpe-Schoolfield | 6 | 53 | |||
| Simplified extended Sharpe-Schoolfield | 7 | 75 | |||
| Simplified B type | 3 | 8 | Francis | fwimp@17c844c, #64 | |
| Skew-normal | 4 | 17 | |||
| Stevenson | 6 | 83 | |||
| Stinner | 4 | 1 | 18 | Dan | padpadpadpad@a6c7d9c |
| Taylor-Sexton | 3 | 6 | Francis | fwimp@17c844c, #64 | |
| Thomas I | 4 | 20 | LIVE | ||
| Thomas II | 5 | 57 | LIVE | ||
| Thornton-Lessem | 6 | 59 | |||
| Tomlinson-Menz | 4 | 37 | |||
| Tomlinson-Phillips | 3 | 12 | Francis | fwimp@17c844c, #64 | |
| Van't Hoff | 4 | 71 | |||
| Wang-Engel | 4 | 24 | |||
| Wang-Lan-Ding | 7 | 77 | |||
| Warren-Dreyer | 3 | 10 | Francis | fwimp@17c844c, #64 | |
| Weibull | 4 | 15 | LIVE | ||
| Yan-Hunt | 4 | 25 |
To add to this, the SI provides the definitions of all functions listed.
Additionally the implementation of these in the original paper give a good clue as to starting values and sensible limits.
Hi @fwimp. Thanks for starting this.
I had started a similar spreadsheet table here: https://docs.google.com/spreadsheets/d/1YTbt3cemVNIbp_67vRe5s0QyUzkoErBvm7SdIv_5xhw/edit?gid=0#gid=0
What does the switched column mean?
The two Thomas models are definitely in there so have updated your table.
Switched means that the function has two or more functional forms (so if temp < topt, do x, if temp >= topt, do y).
Just slightly more complex to implement in a vectorised manner, so something that could be a factor in deciding. If that is of no concern we can just strip out said column
Some helpful functions for use in testing when adding a lot of models at once!
(you'll want to load current versions of rTPC and nls.multstart [i.e. one with lhstype as an option])
get_tpc <- function(tpcname, response){
tpcfunc <- get(tpcname)
tpcargs <- names(formals(tpcfunc))
return(as.formula(paste0(response, "~", paste0(tpcname, "( temp = temp, ", paste(tpcargs[tpcargs != "temp"], collapse = ", "), ")"))))
}
quickfit_tpc <- function(modelname, iter=200, id=1, makegraph=TRUE){
response = "rate"
data('chlorella_tpc')
d <- subset(chlorella_tpc, curve_id == id)
start_vals <- get_start_vals(d$temp, d$rate, model_name = modelname)
mod <- suppressWarnings(nls_multstart(get_tpc(modelname, response),
data = d,
iter = iter,
start_lower = start_vals * 0.5,
start_upper = start_vals * 1.5,
lower = get_lower_lims(d$temp, d$rate, model_name = modelname),
upper = get_upper_lims(d$temp, d$rate, model_name = modelname),
supp_errors = 'Y',
convergence_count = FALSE, lhstype = "random"))
if (makegraph) {
preds <- broom::augment(mod)
p <- ggplot(preds) +
geom_point(aes(temp, rate)) +
geom_line(aes(temp, .fitted)) +
labs(title=paste(modelname, "on curve", id))
theme_bw()
print(p)
}
return(mod)
}
quickfit_dataset <- function(modelnames, iter=200){
response = "rate"
data("chlorella_tpc")
d <- chlorella_tpc
d_fits <- nest(d, data = c(temp, rate))
for (model in modelnames) {
d_fits <- d_fits %>% mutate(!! model := map(data, ~nls_multstart(get_tpc(model, response),
data = .x,
iter = iter,
start_lower = get_start_vals(.x$temp, .x$rate, model_name = model) * 0.5,
start_upper = get_start_vals(.x$temp, .x$rate, model_name = model) * 1.5,
lower = get_lower_lims(.x$temp, .x$rate, model_name = model),
upper = get_upper_lims(.x$temp, .x$rate, model_name = model),
supp_errors = 'Y',
convergence_count = FALSE)))
}
# create new list column of for high resolution data
d_preds <- mutate(d_fits, new_data = map(data, ~tibble(temp = seq(min(.x$temp), max(.x$temp), length.out = 100)))) %>%
# get rid of original data column
select(., -data) %>%
# stack models into a single column, with an id column for model_name
pivot_longer(., names_to = 'model_name', values_to = 'fit', contains(modelnames)) %>%
# create new list column containing the predictions
# this uses both fit and new_data list columns
mutate(preds = map2(fit, new_data, ~augment(.x, newdata = .y))) %>%
# select only the columns we want to keep
select(curve_id, growth_temp, process, flux, model_name, preds) %>%
# unlist the preds list column
unnest(preds)
p <- ggplot(d_preds) +
geom_line(aes(temp, .fitted, col = model_name)) +
geom_point(aes(temp, rate), d) +
facet_wrap(~curve_id, scales = 'free_y', ncol = 6) +
theme_bw() +
# theme(legend.position = 'none') +
scale_color_brewer(type = 'qual', palette = 2) +
labs(x = 'Temperature (ºC)',
y = 'Metabolic rate',
title = 'All fitted thermal performance curves',
subtitle = paste(modelnames, collapse=", "),
col="Model")
print(p)
}
used as quickfit_tpc("flextpc_2024", id=3, iter=200) or quickfit_dataset("flextpc_2024", iter=200)
These are pretty useful for quickly checking that fitting is working vaguely sensible. :)
Edit: quickfit_dataset() can now take a vector of model names and fit each one.
thanks again @fwimp. Need to work out a bit of rlang to understand fully what !! and := are doing, but they definitely feel very powerful.
See you added 8 more models! You are on a festive roll.
Not a problem! Honestly with the new architecture and tests, adding models has become kind of a breeze. Took a few hours to do the 8 yesterday :)
The process I'm sort of following is:
usethis::use_r("modelname_yyyy")to init the modelusethis::use_test()to init the equivalent test (with the new model open in Rstudio)- Copy over model/test code from previous model/test pair and fix up all the names to be the new model
- Redo the model code and starting vals
- Test-run using the functions above
- When that's basically working, re-write test file
- Run test file with the
dput(...)line uncommented to get the fit values for the test and insert back in to the test file. - Run
test-startingvalues.Rto check that all elements of the model are actually implemented (as nls.multfit will often work fine even when they're not working it seems) - Document new model,
devtools::document(), and check manually. - R CMD CHECK to make sure everything is up to scratch.
In terms of !!, this is some pretty fun metaprogramming stuff. !! is the injection operator, which substitutes a variable from the current environment as a data-variable. Looking at it today I think you could do the same thing with the .env pronoun, but this is only developer-auxiliary code so I'm not too worried about absolute best practice for it. Whenever you're using things like the rlang and dplyr metaprogramming tools, you have to use the := walrus operator over the standard = for unspecified technical reasons.
I think you could also do this using glue strings as of rlang v0.4.3, which I frankly only just discovered. But in this case any way that gets the job done is good with me!
I am going to try add a couple of models tomorrow! I see you lurking @mhasoba.
Thanks for all the help @fwimp!
Not a problem at all!
Just finished all the Brière variants (though honestly I think they might enjoy some starting values massaging as the fits are good but the estimated parameters are sometimes a bit off on higher parameter versions). Gonna go onto the Ashrafi variants and see if I can bash those out before I head off to see the family for christmas.
Well, looks like we've now doubled the number of included models from 24 to 48! That's pretty cool :)
I'm not sure how essential the remaining ones are, but always worth chipping away at if there's anything that looks interesting.
I'm going to have a stab at #65 so that if people want other models from here for their own work, then contributing those upstream here should be easy.