DHARMa icon indicating copy to clipboard operation
DHARMa copied to clipboard

DHARMa residuals for jSDM

Open florianhartig opened this issue 4 years ago • 4 comments

Question from a user

Ich hätte eine Frage bezüglich der adäquaten Anwendung des DHARMa-Packages im multivariaten Fall. Eventuell kennen Sie das gjam-package. Mit diesem habe ich ein JSDM für Presence-Absence Daten von mehreren Arten „jointly“ geschätzt. Ich habe also mehrere abhängige Variablen und dementsprechende Residuen. Mir ist kein Pendant zu DHARMa bekannt, dass multivariate Modelle behandelt. Daher wollte ich fragen, ob Sie glauben, dass Ihr package für einen solchen Fall ebenfalls adäquat ist oder ob Ihnen eine Alternative bekannt ist? Ich wäre jetzt folgendermaßen vorgegangen: ich hätte für jede Art einzeln „new data a la DARMAh" simuliert und dann die jeweiligen Residuen analysiert (also pro Art eine Analyse). Eine Alternative wäre alle Observationen (also über alle Arten) in einen Topf zu werfen und dann die gesamten Residuen gemeinsam zu analysieren.

florianhartig avatar Jun 30 '20 15:06 florianhartig

We just discussed this in some length last week in a working group at ISEC 2020. In principle, if you can simulate from your fitted model, you can create residuals for each entry of the community matrix with the createDHARMa function (just note that you will have to ensure that the simulate function returns a vector, DHARMa cannot handle matrix responses), and based on this, you can still use all DHARMa plots, so I would do

  • all standard checks, including spatial autocorrelation
  • and use aggregateResiduals to group residuals by site and species, and check that as well with all standard plots / checks

There are probably a few more things that one could do to check the specific structure of a jSDM, and that was the content of our discussion. If something happens on this front, I will update this issue.

florianhartig avatar Jun 30 '20 15:06 florianhartig

I tried the univariate approach (i.e. dropping the dimensions and treating the JSDM outputs univariate) but it doesn't produce right residual plots. I then tried a multivariate ecdf and it seems to produce correct scaled residuals (perhaps this is different for LVM-JSDMs where we assume a univariate likelihood):

Treating outputs univariate

library(sjSDM) # devel branch
library(DHARMa)
library(mltools)
library(data.table)


## simulate community:
com = simulate_SDM(env = 2L, species = 7L, sites = 100L)

## fit model:
model = sjSDM(Y = com$response,
              env = linear(com$env_weights, ~.), 
              iter = 50L,
              family = binomial("probit"))
# increase iter for your own data 
sims = simulate(model, nsim = 200L)
dim(sims)

sp = 1
dim(sims)
sim = 
  createDHARMa(t(array(sims, c(200, 100*7))), 
              c(com$response), 
              c(predict(model)),
              integerResponse = TRUE)
plot(sim)

image

Treating outputs multivariate

vs = 
sapply(1:100, function(i) {
  empirical_cdf(as.data.table(sims[,i,]) + matrix(runif(200*7, -0.5, 0.5), 200, 7),
              ubounds = as.data.table(com$response[i,]+runif(7,-0.5,0.5)))$CDF
})

sim$scaledResiduals = c(vs)
sim$observedResponse = c(com$response)
sim$fittedPredictedResponse = c(predict(model))
sim$nObs = 700
plot(sim)
image

So the solution for JSDM support could be to implement the multivariate ecdf in DHARMa.

MaximilianPi avatar Feb 21 '22 06:02 MaximilianPi

Hi Max,

thanks - 2 questions:

a) in your multivariate code, when you use matrix(runif(200*7, -0.5, 0.5), 200, 7), this is the old idea of DHARMa randomisation, in all new versions I'm using the PIT randomisation per default ... is there a specific reason why you switched to the old randomisation procedure?

b) On a first glance, I don't really get what you mean by "treating the outputs multivariate" - you to the ecdf per site (sapply), but then in sapply, you create one residual per species, right? I think I have to run it myself to see what's in the values, but just to get the idea ...

florianhartig avatar Feb 21 '22 16:02 florianhartig

Add-on note: in principle we could aggregate residuals per site (over species) or per species (over sites), or show all residuals, as in the internal structure ...

florianhartig avatar Feb 21 '22 16:02 florianhartig