ergm
ergm copied to clipboard
calculation errors of the `dgwesp` term
I notice that the calculation of dgwesp
term is lower than what it should be for both "OTP" (outgoing two-paths) and "ITP" (incoming two-paths). Interestingly, the gwesp
term, when applied to digraph, calculates the dgwesp(OTP) correctly. The problem only occurs when the network size is large enough, and it always under-counts the statistics in my experiment, so I suspect there might be some overflow issues that only happen to the dgwesp
implementation.
Please see below a toy example that demonstrates the problem. Basically the script obtains the census of desp
(edgewise shared-partners), and calculates the dgwesp statistics by hand to compare it with results from dgwesp
and gwesp
in ergm.
Thanks!
library(network)
library(sna)
library(ergm)
#generate a random network----
nw <- network(rgraph(100))
as.sociomatrix(nw)
#calculate directed edgewise shared partners using esp census-----
#For OTP: outgoing two-paths
desp.otp <- summary(nw~desp(0:(network.size(nw)-2), type="OTP"))
desp.otp
#a gw function
gwesp <- function(alpha, i){
exp(alpha)*(1-(1-exp(-alpha))^i)
}
#calculate dgwesp (OTP) from desp, decay=0.75
x <- 0
for (i in 1:length(desp.otp)) {
x <- x+gwesp(alpha=0.75,i=i-1)*desp.otp[i] #i-1: start from 0
}
x
#calculate dgwesp (OTP) using ergm's dgwesp and gwesp------
summary(nw~dgwesp(0.75, fixed=T, type="OTP")+gwesp(0.75, fixed=T))
#replicate this for ITP: incoming two-paths-----
desp.itp <- summary(nw~desp(0:(network.size(nw)-2), type="ITP"))
desp.itp
y <- 0
for (i in 1:length(desp.itp)) {
y <- y+gwesp(alpha=0.75,i=i-1)*desp.itp[i] #i-1: start from 0
}
y
summary(nw~dgwesp(0.75, fixed=T, type="ITP"))
Thanks for the detailed report! The problem is that whereas the gwesp()
implementation computes the quantity of interest directly, the dgwesp
implementation first computes the desp()
frequencies, then takes their weighted sum. Since most problems in practice involve sparse networks, the SP quantities for which it keeps track is capped by gw.cutoff
term option (see options?ergm
or help for any of these terms). This defaults to 30, and if a network distribution has higher shared partner counts than that, it will result in incorrect values.
You can set this cutoff globally. For example running options(ergm.term=list(gw.cutoff=100))
before running all of your code will result in matching answers.
All variants of gw*sp(fix=TRUE)
now use the full SP distribution regardless of cutoff. dg*sp(fix=FALSE)
and gwb*sp(fix=FALSE)
will immediately stop with an error if they ever encounter any configurations past the cutoff.