ergm
ergm copied to clipboard
gw*degree terms do not have default 'decay' like gw*sp terms
I recently encountered an issue with the gw*degree terms. Pleas see the simple reproducible example below:
library(ergm) data(sampson) ergm(samplike ~ edges + gwodegree(fixed = FALSE))
The output is: Error: In term ‘gwodegree’ in package ‘ergm’: argument ‘decay’ is missing, with no default.
I encountered this with an undirected network using gwdegree(fixed = FALSE), but it reproduces in the directed case with gwodegree and gwidegree terms as well.
The first argument in gw* terms is the 'decay' term, which does not have a default set. In contrast gwesp does have a default set.
From help(ergm.terms): • gwesp(decay=0, fixed=FALSE, cutoff=30) • gwidegree(decay, fixed=FALSE, attr=NULL, cutoff=30, levels=NULL)
The error can be circumvented if 'decay' is specified for the gw*degree terms, so this is not preventing their use at this time. But, I wanted to report the discrepancy.
Thank you, Jonathan
I am reluctant to add a default for gw*degree
. Basically, for gw*sp
, we know that the 0 decay is a "safe" value: it represents transitive ties or dyads. It's not quite as clear for the degrees. Do we have an interpretation for those?
Is there a reason it wouldn't be the same-ish interpretation: the propensity to have at least one tie?
Hi Pavel,
Thank you for your reply.
I think Martina is correct here. The resultant model term would model the propensity for each node to have at least one tie.
It would have an equivalent effect of an isolates term, as capturing the average propensity for nodes to have degree 0 necessarily captures the average propensity for nodes to have degree greater than 0. The difference being the interpretation of the parameter and model term.
I made a short write-out showing the statistic and natural parameter for the model term when the decay = 0: gwdegree_decay0.pdf
Perhaps one option to work-around the error without setting a decay default for GW*Degree terms would be to set the decay default to NULL. When 'fixed = FALSE' is selected, it would no longer throw an error, and when 'fixed = TRUE' is selected, it would require specification of the decay parameter, which is currently the case.
Best, Jonathan
Thanks for looking into this, @jrstew , and sorry about the slow reply. The argument will be optional in 4.0 (thanks to the curved MPLE implementation), so I am not sure it's worth adding a default here now, particularly since "nonisolates" might not be the sort of behaviour that the end-user would expects.
... before we close this issue, i'd like to verify that the curved models now fit in reasonable time for larger (e.g., faux.magnolia sized) networks. if not, we may want to implement a simple way for users to specify the curved parameter that involves a default value (e.g., setting fixed=T
has a default).
@martinamorris , do you want to verify this before we close this?
Summary of this and next 2 comments:
- Problems with
sampson
fits usinggwodegree
- No problems with fits for
faux.magnolia
(undirected, gwdegree) orfaux.dixon
(directed, gwodegree)
I'm still getting an error with:
> library(ergm)
> data(sampson)
> ergm(samplike ~ edges + gwodegree(fixed = FALSE))
Starting maximum pseudolikelihood estimation (MPLE):
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Error in check.objfun.output(out, minimize, d) :
objfun returned gradient not having all elements finite
And this hangs:
ergm(samplike ~ edges + gwodegree())
Starting maximum pseudolikelihood estimation (MPLE):
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Warning: Model statistics ‘gwodegree’ and ‘gwodegree.decay’ are linear combinations of some set of preceding statistics at the current stage of the estimation. This may indicate that the model is nonidentifiable.
Starting Monte Carlo maximum likelihood estimation (MCMLE):
Iteration 1 of at most 60:
This works:
> ergm(samplike ~ edges + gwodegree(decay=0, fixed=T))
Observed statistic(s) gwodeg.fixed.0 are at their greatest attainable values. Their coefficients will be fixed at +Inf.
Starting maximum pseudolikelihood estimation (MPLE):
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Starting Monte Carlo maximum likelihood estimation (MCMLE):
Iteration 1 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0012.
Convergence test p-value: 0.0001. Converged with 99% confidence.
Finished MCMLE.
Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
Bridging between the dyad-independent submodel and the full model...
Setting up bridge sampling...
Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
Bridging finished.
This model was fit using MCMC. To examine model diagnostics and check
for degeneracy, use the mcmc.diagnostics() function.
Call:
ergm(formula = samplike ~ edges + gwodegree(decay = 0, fixed = T))
Last MCMC sample of size 156 based on:
edges gwodeg.fixed.0
-0.9072 Inf
Monte Carlo Maximum Likelihood Coefficients:
edges gwodeg.fixed.0
-0.9132 Inf
Note:
> summary(samplike ~ odegree(0:10))
odegree0 odegree1 odegree2 odegree3 odegree4 odegree5 odegree6 odegree7
0 0 0 1 5 7 5 0
odegree8 odegree9 odegree10
0 0 0
Unconstrained estimation works fast and well for faux.magnolia.high
-- which is undirected:
> data("faux.magnolia.high")
> ergm(faux.magnolia.high ~ edges + gwdegree())
Starting maximum pseudolikelihood estimation (MPLE):
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Starting Monte Carlo maximum likelihood estimation (MCMLE):
Iteration 1 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.5452.
Estimating equations are not within tolerance region.
Iteration 2 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0040.
Convergence test p-value: 0.5452. Not converged with 99% confidence; increasing sample size.
Iteration 3 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0373.
Convergence test p-value: 0.1324. Not converged with 99% confidence; increasing sample size.
Iteration 4 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0119.
Convergence test p-value: 0.0344. Not converged with 99% confidence; increasing sample size.
Iteration 5 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0092.
Convergence test p-value: 0.0036. Converged with 99% confidence.
Finished MCMLE.
Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
Bridging between the dyad-independent submodel and the full model...
Setting up bridge sampling...
Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
Bridging finished.
This model was fit using MCMC. To examine model diagnostics and check
for degeneracy, use the mcmc.diagnostics() function.
Call:
ergm(formula = faux.magnolia.high ~ edges + gwdegree())
Last MCMC sample of size 1050 based on:
edges gwdegree gwdegree.decay
-5.6558 -1.2097 0.9041
Monte Carlo Maximum Likelihood Coefficients:
edges gwdegree gwdegree.decay
-5.6353 -1.2207 0.9184
It also works fast and well when specifying decay=0, fixed=T
And unconstrained estimation works fast and well for faux.dixon.high
(directed, n=248), as does specification decay=0, fixed=T
(not shown) and just fixed = F
(also not shown):
summary(faux.dixon.high ~ odegree(0:15)) odegree0 odegree1 odegree2 odegree3 odegree4 odegree5 odegree6 odegree7 50 8 19 30 21 27 14 20 odegree8 odegree9 odegree10 odegree11 odegree12 odegree13 odegree14 odegree15 18 14 6 6 3 2 4 0 ergm(faux.dixon.high ~ edges + gwodegree()) Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. Maximizing the pseudolikelihood. Finished MPLE. Starting Monte Carlo maximum likelihood estimation (MCMLE): Iteration 1 of at most 60: Optimizing with step length 0.3570. The log-likelihood improved by 3.3858. Estimating equations are not within tolerance region. Iteration 2 of at most 60: Optimizing with step length 0.7030. The log-likelihood improved by 11.4690. Estimating equations are not within tolerance region. Iteration 3 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.5792. Estimating equations are not within tolerance region. Iteration 4 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0116. Convergence test p-value: 0.1809. Not converged with 99% confidence; increasing sample size. Iteration 5 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0633. Convergence test p-value: 0.2709. Not converged with 99% confidence; increasing sample size. Iteration 6 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0258. Convergence test p-value: 0.0097. Converged with 99% confidence. Finished MCMLE. Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel... Bridging between the dyad-independent submodel and the full model... Setting up bridge sampling... Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 . Bridging finished. This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
Call: ergm(formula = faux.dixon.high ~ edges + gwodegree())
Last MCMC sample of size 640 based on:
edges gwodegree gwodegree.decay
-3.329 -2.625 1.107
Monte Carlo Maximum Likelihood Coefficients:
edges gwodegree gwodegree.decay
-3.337 -2.615 1.095
samplike
outdegrees is a poor example, because it has weird outdegree constraints: Sampson asked each monk to name their three most liked monks each at three time points (and I think one or two didn't comply and named 4). A tie is defined to exist if an alter was nominated by the ego at least once.
As a result, we get
> degreedist(samplike)
odegree3 odegree4 odegree5 odegree6
1 5 7 5
idegree2 idegree3 idegree4 idegree5 idegree6 idegree7 idegree8 idegree10 idegree11
3 5 1 3 2 1 1 1 1
and it's not at all clear what gwodegree
is supposed to do with that.
That said, I am getting consistent errors for both of the following, which is more problematic:
library(ergm)
data(sampson)
ergm(samplike ~ edges + gwidegree)
ergm(samplike ~ edges + gwodegree, constraints=~bd(minout=3, maxout=9))
It might just be that there is not enough information to estimate the curved parameter, but it should still handle things more gracefully.
Yep. Is there a way to identify insufficient information and include that in the error msg?