ergm icon indicating copy to clipboard operation
ergm copied to clipboard

gw*degree terms do not have default 'decay' like gw*sp terms

Open jrstew opened this issue 5 years ago • 12 comments

I recently encountered an issue with the gw*degree terms. Pleas see the simple reproducible example below:

library(ergm) data(sampson) ergm(samplike ~ edges + gwodegree(fixed = FALSE))

The output is: Error: In term ‘gwodegree’ in package ‘ergm’: argument ‘decay’ is missing, with no default.

I encountered this with an undirected network using gwdegree(fixed = FALSE), but it reproduces in the directed case with gwodegree and gwidegree terms as well.

The first argument in gw* terms is the 'decay' term, which does not have a default set. In contrast gwesp does have a default set.

From help(ergm.terms): • gwesp(decay=0, fixed=FALSE, cutoff=30) • gwidegree(decay, fixed=FALSE, attr=NULL, cutoff=30, levels=NULL)

The error can be circumvented if 'decay' is specified for the gw*degree terms, so this is not preventing their use at this time. But, I wanted to report the discrepancy.

Thank you, Jonathan

jrstew avatar Sep 13 '19 18:09 jrstew

I am reluctant to add a default for gw*degree. Basically, for gw*sp, we know that the 0 decay is a "safe" value: it represents transitive ties or dyads. It's not quite as clear for the degrees. Do we have an interpretation for those?

krivit avatar Sep 20 '20 12:09 krivit

Is there a reason it wouldn't be the same-ish interpretation: the propensity to have at least one tie?

martinamorris avatar Sep 21 '20 18:09 martinamorris

Hi Pavel,

Thank you for your reply.

I think Martina is correct here. The resultant model term would model the propensity for each node to have at least one tie.

It would have an equivalent effect of an isolates term, as capturing the average propensity for nodes to have degree 0 necessarily captures the average propensity for nodes to have degree greater than 0. The difference being the interpretation of the parameter and model term.

I made a short write-out showing the statistic and natural parameter for the model term when the decay = 0: gwdegree_decay0.pdf

Perhaps one option to work-around the error without setting a decay default for GW*Degree terms would be to set the decay default to NULL. When 'fixed = FALSE' is selected, it would no longer throw an error, and when 'fixed = TRUE' is selected, it would require specification of the decay parameter, which is currently the case.

Best, Jonathan

jrstew avatar Sep 21 '20 19:09 jrstew

Thanks for looking into this, @jrstew , and sorry about the slow reply. The argument will be optional in 4.0 (thanks to the curved MPLE implementation), so I am not sure it's worth adding a default here now, particularly since "nonisolates" might not be the sort of behaviour that the end-user would expects.

krivit avatar Dec 23 '20 07:12 krivit

... before we close this issue, i'd like to verify that the curved models now fit in reasonable time for larger (e.g., faux.magnolia sized) networks. if not, we may want to implement a simple way for users to specify the curved parameter that involves a default value (e.g., setting fixed=T has a default).

martinamorris avatar Dec 23 '20 18:12 martinamorris

@martinamorris , do you want to verify this before we close this?

krivit avatar May 05 '21 10:05 krivit

Summary of this and next 2 comments:

  • Problems with sampson fits using gwodegree
  • No problems with fits for faux.magnolia (undirected, gwdegree) or faux.dixon (directed, gwodegree)

I'm still getting an error with:

> library(ergm)
> data(sampson)
> ergm(samplike ~ edges + gwodegree(fixed = FALSE))
Starting maximum pseudolikelihood estimation (MPLE):
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Error in check.objfun.output(out, minimize, d) : 
  objfun returned gradient not having all elements finite

And this hangs:

ergm(samplike ~ edges + gwodegree())
Starting maximum pseudolikelihood estimation (MPLE):
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Warning: Model statistics ‘gwodegree’ and ‘gwodegree.decay’ are linear combinations of some set of preceding statistics at the current stage of the estimation. This may indicate that the model is nonidentifiable.
Starting Monte Carlo maximum likelihood estimation (MCMLE):
Iteration 1 of at most 60:

This works:

> ergm(samplike ~ edges + gwodegree(decay=0, fixed=T))
Observed statistic(s) gwodeg.fixed.0 are at their greatest attainable values. Their coefficients will be fixed at +Inf.
Starting maximum pseudolikelihood estimation (MPLE):
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Starting Monte Carlo maximum likelihood estimation (MCMLE):
Iteration 1 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0012.
Convergence test p-value: 0.0001. Converged with 99% confidence.
Finished MCMLE.
Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
Bridging between the dyad-independent submodel and the full model...
Setting up bridge sampling...
Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
Bridging finished.
This model was fit using MCMC.  To examine model diagnostics and check
for degeneracy, use the mcmc.diagnostics() function.

Call:
ergm(formula = samplike ~ edges + gwodegree(decay = 0, fixed = T))

Last MCMC sample of size 156 based on:
         edges  gwodeg.fixed.0  
       -0.9072             Inf  

Monte Carlo Maximum Likelihood Coefficients:
         edges  gwodeg.fixed.0  
       -0.9132             Inf 

Note:

> summary(samplike ~ odegree(0:10))
 odegree0  odegree1  odegree2  odegree3  odegree4  odegree5  odegree6  odegree7 
        0         0         0         1         5         7         5         0 
 odegree8  odegree9 odegree10 
        0         0         0 

martinamorris avatar May 21 '21 15:05 martinamorris

Unconstrained estimation works fast and well for faux.magnolia.high -- which is undirected:

> data("faux.magnolia.high")
> ergm(faux.magnolia.high ~ edges + gwdegree())
Starting maximum pseudolikelihood estimation (MPLE):
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Starting Monte Carlo maximum likelihood estimation (MCMLE):
Iteration 1 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.5452.
Estimating equations are not within tolerance region.
Iteration 2 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0040.
Convergence test p-value: 0.5452. Not converged with 99% confidence; increasing sample size.
Iteration 3 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0373.
Convergence test p-value: 0.1324. Not converged with 99% confidence; increasing sample size.
Iteration 4 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0119.
Convergence test p-value: 0.0344. Not converged with 99% confidence; increasing sample size.
Iteration 5 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0092.
Convergence test p-value: 0.0036. Converged with 99% confidence.
Finished MCMLE.
Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
Bridging between the dyad-independent submodel and the full model...
Setting up bridge sampling...
Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
Bridging finished.
This model was fit using MCMC.  To examine model diagnostics and check
for degeneracy, use the mcmc.diagnostics() function.

Call:
ergm(formula = faux.magnolia.high ~ edges + gwdegree())

Last MCMC sample of size 1050 based on:
         edges        gwdegree  gwdegree.decay  
       -5.6558         -1.2097          0.9041  

Monte Carlo Maximum Likelihood Coefficients:
         edges        gwdegree  gwdegree.decay  
       -5.6353         -1.2207          0.9184 

It also works fast and well when specifying decay=0, fixed=T

martinamorris avatar May 21 '21 15:05 martinamorris

And unconstrained estimation works fast and well for faux.dixon.high (directed, n=248), as does specification decay=0, fixed=T (not shown) and just fixed = F (also not shown):

summary(faux.dixon.high ~ odegree(0:15)) odegree0 odegree1 odegree2 odegree3 odegree4 odegree5 odegree6 odegree7 50 8 19 30 21 27 14 20 odegree8 odegree9 odegree10 odegree11 odegree12 odegree13 odegree14 odegree15 18 14 6 6 3 2 4 0 ergm(faux.dixon.high ~ edges + gwodegree()) Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. Maximizing the pseudolikelihood. Finished MPLE. Starting Monte Carlo maximum likelihood estimation (MCMLE): Iteration 1 of at most 60: Optimizing with step length 0.3570. The log-likelihood improved by 3.3858. Estimating equations are not within tolerance region. Iteration 2 of at most 60: Optimizing with step length 0.7030. The log-likelihood improved by 11.4690. Estimating equations are not within tolerance region. Iteration 3 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.5792. Estimating equations are not within tolerance region. Iteration 4 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0116. Convergence test p-value: 0.1809. Not converged with 99% confidence; increasing sample size. Iteration 5 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0633. Convergence test p-value: 0.2709. Not converged with 99% confidence; increasing sample size. Iteration 6 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0258. Convergence test p-value: 0.0097. Converged with 99% confidence. Finished MCMLE. Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel... Bridging between the dyad-independent submodel and the full model... Setting up bridge sampling... Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 . Bridging finished. This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.

Call: ergm(formula = faux.dixon.high ~ edges + gwodegree())

Last MCMC sample of size 640 based on: edges gwodegree gwodegree.decay
-3.329 -2.625 1.107

Monte Carlo Maximum Likelihood Coefficients: edges gwodegree gwodegree.decay
-3.337 -2.615 1.095

martinamorris avatar May 21 '21 15:05 martinamorris

samplike outdegrees is a poor example, because it has weird outdegree constraints: Sampson asked each monk to name their three most liked monks each at three time points (and I think one or two didn't comply and named 4). A tie is defined to exist if an alter was nominated by the ego at least once.

As a result, we get

> degreedist(samplike)
odegree3 odegree4 odegree5 odegree6 
       1        5        7        5 
 idegree2  idegree3  idegree4  idegree5  idegree6  idegree7  idegree8 idegree10 idegree11 
        3         5         1         3         2         1         1         1         1 

and it's not at all clear what gwodegree is supposed to do with that.

That said, I am getting consistent errors for both of the following, which is more problematic:

library(ergm)
data(sampson)
ergm(samplike ~ edges + gwidegree)
ergm(samplike ~ edges + gwodegree, constraints=~bd(minout=3, maxout=9))

krivit avatar May 22 '21 00:05 krivit

It might just be that there is not enough information to estimate the curved parameter, but it should still handle things more gracefully.

krivit avatar May 22 '21 00:05 krivit

Yep. Is there a way to identify insufficient information and include that in the error msg?

martinamorris avatar May 24 '21 17:05 martinamorris