sdmTMB
sdmTMB copied to clipboard
Expanding abundance when fitting Bernoulli-cloglog model
One use-case for SDMs is to fit a logistic-regression using a cloglog link function, but then (instead of using the inverse-link to calculate the predictor) exponentiating the linear predictor to calculate abundance that could then be reported from predict
or when calculating an area-weighted abundance index. This is, e.g., used by Gruss-Thorson-2019, and it matches the interpretation of a cloglog link as a thinned Poisson-point-process from the presence-only literature. It is implemented in VAST using ObsModel[13,1], as demonstrated in the combined-data demo.
I envision that we need some interface that separates (1) the inverse-link function that's applied when calculating the data likelihood from (2) the inverse-link function that's applied when calculating the predicted response for predict
calls. I'd then copy the same interface for use in tinyVAST
. Any ideas, or do you want to discuss @seananderson?
To clarify the problem a bit more: If you have a count $C$ from a Poisson point process with intensity \lambda
, then:
Pr(C>0) = \pi = 1 - e^{-\lambda)
We then might want to calculate the total intensity int_s{\lambda ds}
as predicted abundance. A Bernoulli GLM with cloglog-link arises fits that likelihood using linear predictor log(\lambda)
, but we then want to treat it as log-linked when calculating the abundance-index.
I think the simplest option is to pass TMB two family
arguments (perhaps family
and family_pred
, where one is used during the data-likelihood, and the other during expansion. By default, the family_pred
argument is missing, and then fixed at family_pred = family
. But advanced users could pass family_pred
(perhaps via sdmTMBcontrol
or via predict
) that over-rides the default.
As I said, I need to implement this in tinyVAST e.g., to allow this demo to give the same index results when using the presence-absence or other variables as default factor level, and I'd love to keep in lock-step with sdmTMB regarding family
interface.
Hmm. A few thoughts:
- Do we need the full second family architecture or would a second link (or vector of links) be enough? Perhaps that depends whether this might foreseeably be needed in delta families. Might one ever be using a delta model (say for biomass) and then only using the first linear predictor as part of an abundance index? Maybe? That's already baked into the Poisson-link families, I guess, with the log link.
- I was going to say I'm less fussed about the predict function, since the default is in link space anyways and the user can always inverse-link the linear predictor however they'd want, but perhaps a user would want to make conditional predictions with standard errors, say using only fixed effects.
- In a fit->predict->expansion workflow it's nice to have flexibility at the last stage for integration-specific options (like an area vector). So, at the very least, perhaps
integrate_output()
/get_index()
should have this extra argument. Sinceintegrate_output()
skips over the prediction object, I presume the argument should be there. - I'd like to keep 'control' about optimizer settings and similar if possible.
- I presume defining a
bernoulli()
family with an extra argument wouldn't work? That gets aroundbinomial()
being a built-in function. I guess this could also apply to binomial with trials > 1.
Assuming it might be used in a standard delta model and that a new Bernoulli wouldn't solve it, yes, something like family_pred
in integrate_output()
, and optionally also predict()
, would work.