DHARMa
DHARMa copied to clipboard
DHARMa residuals for jSDM
Question from a user
Ich hätte eine Frage bezüglich der adäquaten Anwendung des DHARMa-Packages im multivariaten Fall. Eventuell kennen Sie das gjam-package. Mit diesem habe ich ein JSDM für Presence-Absence Daten von mehreren Arten „jointly“ geschätzt. Ich habe also mehrere abhängige Variablen und dementsprechende Residuen. Mir ist kein Pendant zu DHARMa bekannt, dass multivariate Modelle behandelt. Daher wollte ich fragen, ob Sie glauben, dass Ihr package für einen solchen Fall ebenfalls adäquat ist oder ob Ihnen eine Alternative bekannt ist? Ich wäre jetzt folgendermaßen vorgegangen: ich hätte für jede Art einzeln „new data a la DARMAh" simuliert und dann die jeweiligen Residuen analysiert (also pro Art eine Analyse). Eine Alternative wäre alle Observationen (also über alle Arten) in einen Topf zu werfen und dann die gesamten Residuen gemeinsam zu analysieren.
We just discussed this in some length last week in a working group at ISEC 2020. In principle, if you can simulate from your fitted model, you can create residuals for each entry of the community matrix with the createDHARMa
function (just note that you will have to ensure that the simulate function returns a vector, DHARMa cannot handle matrix responses), and based on this, you can still use all DHARMa plots, so I would do
- all standard checks, including spatial autocorrelation
- and use
aggregateResiduals
to group residuals by site and species, and check that as well with all standard plots / checks
There are probably a few more things that one could do to check the specific structure of a jSDM, and that was the content of our discussion. If something happens on this front, I will update this issue.
I tried the univariate approach (i.e. dropping the dimensions and treating the JSDM outputs univariate) but it doesn't produce right residual plots. I then tried a multivariate ecdf and it seems to produce correct scaled residuals (perhaps this is different for LVM-JSDMs where we assume a univariate likelihood):
Treating outputs univariate
library(sjSDM) # devel branch
library(DHARMa)
library(mltools)
library(data.table)
## simulate community:
com = simulate_SDM(env = 2L, species = 7L, sites = 100L)
## fit model:
model = sjSDM(Y = com$response,
env = linear(com$env_weights, ~.),
iter = 50L,
family = binomial("probit"))
# increase iter for your own data
sims = simulate(model, nsim = 200L)
dim(sims)
sp = 1
dim(sims)
sim =
createDHARMa(t(array(sims, c(200, 100*7))),
c(com$response),
c(predict(model)),
integerResponse = TRUE)
plot(sim)
![image](https://user-images.githubusercontent.com/29951520/154903667-94c49364-9153-44b6-98e8-8f5c605c0887.png)
Treating outputs multivariate
vs =
sapply(1:100, function(i) {
empirical_cdf(as.data.table(sims[,i,]) + matrix(runif(200*7, -0.5, 0.5), 200, 7),
ubounds = as.data.table(com$response[i,]+runif(7,-0.5,0.5)))$CDF
})
sim$scaledResiduals = c(vs)
sim$observedResponse = c(com$response)
sim$fittedPredictedResponse = c(predict(model))
sim$nObs = 700
plot(sim)
![image](https://user-images.githubusercontent.com/29951520/154903804-00c504bb-e5d6-41df-89de-f47cc6dd7976.png)
So the solution for JSDM support could be to implement the multivariate ecdf in DHARMa.
Hi Max,
thanks - 2 questions:
a) in your multivariate code, when you use matrix(runif(200*7, -0.5, 0.5), 200, 7), this is the old idea of DHARMa randomisation, in all new versions I'm using the PIT randomisation per default ... is there a specific reason why you switched to the old randomisation procedure?
b) On a first glance, I don't really get what you mean by "treating the outputs multivariate" - you to the ecdf per site (sapply), but then in sapply, you create one residual per species, right? I think I have to run it myself to see what's in the values, but just to get the idea ...
Add-on note: in principle we could aggregate residuals per site (over species) or per species (over sites), or show all residuals, as in the internal structure ...