phyr communityPGLMM.bayes

Russell,

I've just tried communityPGLMM.bayes with a binomial distribution in the cbind(success, failure) format, and I get the error

Error in INLA::inla(as.formula(inla_formula), data = data, verbose = verbose, : dim(y...orig)[2] == 1 is not TRUE

Is this because of the format?

I'm trying the bayes version because I'm working with a large dataset, and the ML version isn't converging.

Thanks, Tony

Oct 11 '18 14:10 arives

I don't think the binomial distribution has been implemented in our code yet. The binary version should work.

Oct 11 '18 15:10 daijiang

Hi Tony,

As Daijiang mentioned, only binary coding works at the moment. It's on my to do list to add successes and trials coding. Should be pretty easy to do so I will try and get that done today. I'll let you know when the update is merged.

Cheers,

Russell

On Fri., 12 Oct. 2018, 2:04 am Anthony R. Ives, [email protected] wrote:

Russell,

I've just tried communityPGLMM.bayes with a binomial distribution in the cbind(success, failure) format, and I get the error

Error in INLA::inla(as.formula(inla_formula), data = data, verbose = verbose, : dim(y...orig)[2] == 1 is not TRUE

Is this because of the format?

I'm trying the bayes version because I'm working with a large dataset, and the ML version isn't converging.

Thanks, Tony

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/daijiang/phyr/issues/30, or mute the thread https://github.com/notifications/unsubscribe-auth/AFq4OTWQXQYplkh7F-ApMobCRitVXwCGks5uj14AgaJpZM4XXuhO .

Oct 11 '18 17:10 rdinnager

I've made a pull request where I have added this functionality, which you can try once it has been merged. You have to add the number of trials as a vector to the communityPGLMM call, like this:

phyr::communityPGLMM(freq ~ 1 + shade + (1 | sp__) + (1 | site) + (1 | sp__@site), 
                                               dat, tree = phylotree, family = 'binomial', add.obs.re = F, bayes = TRUE,
                                               Ntrials = dat$freq + dat$freq2, ML.init = FALSE,
                                               prior = "pc.prior.auto")

I would recommend using prior = "pc.prior.auto", because the default priors are usually quite bad, especially for non-gaussian response. Also, ML.init = FALSE will be necessary since ML.init = TRUE (the default) will fit the ML version first to use as starting values, and since it isn't converging, that will not be too useful. Let me know if you run into any other issues.

Cheers!

Oct 12 '18 03:10 rdinnager

Tony, Now you should be able to use the bind(success, fail) in binomial formula when using bayes = T. Remember to set ML.init = F so it won't fit the ML version first. Cheers, Daijiang

Oct 12 '18 20:10 daijiang

Daijiang,

Thanks so much for doing this! In the last 2-3 days I've looked more carefully for ways to improve the stability and speed of PGLMM. I'm doing it because the final version of Jesse's ms was accepted, but it really should include a binomial model. We're supposed to return a final version next week.

There are some small improvements possible. A problem is that the working estimates of b can get very large or small, and this causes problems with the iterative estimates of the mean effects. The values of b can be truncated (e.g., b[b<(-8)] <- (-8)), but where the cutoff is affects the results sometimes.

I also looked at the possibility of using a Laplace approximation, but there are technical reasons why I think this might not be a good solution. The Laplace transform, even though it solves the problem in a different way, still requires an iterative solution to the values of b.

By the way, your choice of nelder-mead-nlopt is a good one!

Cheers, Tony

Oct 13 '18 11:10 arives

The bayes version still doesn't seem to take the bind() specification. I installed phyr from both the master and bayes branches, and both gave the same:

mod.pglmm.bayes <- communityPGLMM(value ~ env*trait, random.effects = list(re.sp, re.spV, re.env.sp, re.site, re.obs), family="binomial", data=dat, verbose=T, bayes = TRUE, ML.init = FALSE) Error in INLA::inla(as.formula(inla_formula), data = data, verbose = verbose, : dim(y...orig)[2] == 1 is not TRUE

Oct 14 '18 12:10 arives

Tony, Can you use cbind(success, fail) instead of value in the formula? It is tricky when using value in parsing the formula. Best, Daijiang

Oct 14 '18 14:10 daijiang

Thanks. I took a quick look, and it wasn't obvious to me how to apply the same code used in the PGLMM approach in the bayes approach. It might be something worth updating just to match glmer. Formulating binomial distributions is always annoying, so making it match glmer might be easier for people.

Overall, I'm really impressed with the bayes version. Thanks Russell! Honestly, for large problems or detailed simulation studies, my way of doing binomial distributions (using PQL) is just too slow. I guess we could try to hack glmer for their code. But a lot of the speed of glmer is based on being able to decompose non-nested matrices. I guess I could also use something like a Gibbs sampler. But bayes is working well. I think I might just stick with it.

Oct 14 '18 16:10 arives

Hi Tony, I think glmer also use the cbind(success, fail) form for binomial distribution, right? Daijiang

Oct 14 '18 17:10 daijiang

I don't think you can enter "cbind(s,f)", but you can do x$value <- cbind(s,f) and then use value. That's why I set up the non-bayes version like this.

Oct 14 '18 17:10 arives

I think you can. https://rdrr.io/cran/lme4/man/glmer.html see the examples.

Oct 14 '18 18:10 daijiang

Hi Tony,

Thanks! I am glad you like the bayes version. I can't take too much credit however, I just figured out how to specify the model in INLA, which is a package which I am continually impressed with. So thanks goes to the awesome authors of INLA. I have been using it to expand the basic idea of PGLMMs into some cool extensions (such as incorporating space into the models). Anyway, I still want to write a few utility functions for communityPGLMM objects fit with bayes = TRUE, so that users can easily extract and plot things like the marginal posterior distributions of the parameters (or sample from the joint posterior). @daijiang I'm interested in make some plotting functions using ggplot2. Is it cool if I import some functions from ggplot2 into the phy namespace for that?

Oct 15 '18 04:10 rdinnager

Russell,

I think Daijiang should have final say over importing ggplot2. Honestly, it annoys me, because it seems intentionally designed to be incompatible with other data structures. But I know a lot of people use it, including Daijiang.

Boy, I do seem to sound more and more like a grumpy old fart. But you have to know that I didn't get rid of my b&w monitor until 2001. So maybe I've been a grumpy old fart all my life.

Cheers, Tony

Oct 15 '18 11:10 arives

Hi Russell, I think I should spend sometime to learn INLA later! Is it possible to use current syntax of phyr to do the models you want to do? Like including spatial correlation with (1|site__) and specify the V matrix for site? (If we treat time as "site" this can also be temporal autocorrelations) Currently, the data are limited to be organized as species nested within sites, it is possible to include another variable such as year so that temporal and spatial autocorrelations can be included at the same time.

In terms of ggplot2, I think it should be fine to import it in. We have some plotting functions to show the random term matrices in phyr using lattice, but nothing to plot posterior distributions. Is there any other packages have plot functions for INLA outputs?

Tony, I think compatibility is not an issue because we can easily transform the structure of the data to be plotted with base plot function in R. In most cases, ggplot2 uses the "long" format while base R functions use the "wide" format data sets.

Cheers, Daijiang

Oct 15 '18 13:10 daijiang

Russell and Daijiang,

I think the general specification of communityPGLMM will handle pretty much any covariance matrix if you use the re=list() version with a single element in the list. If you have a factor other than sp and site, the re=list() with 3 or 4 elements should work too (and should be faster).

As for ggplot2, I was just whining. Yes, as long as you don't need to interact with ggplot2, it doesn't matter: you can just give it stuff. It's just not very flexible.

Cheers, Tony

Oct 15 '18 14:10 arives

I just remembered why it would be good to have communityPGLMM take data.frames with a variable value = cbind(success, failure). simulate() and update() don't work for glmer() unless this format is used.

Cheers, Tony

Oct 16 '18 17:10 arives

Hi Tony, Can you post some code to show that it does not work? It seems work for me.

library(lme4)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial)
update(gm1,devFunOnly=TRUE)
simulate(gm1, nsim = 1)

Cheers, Daijiang

Oct 17 '18 01:10 daijiang

Daijiang,

Oh, the problem comes when you try to have update within another function. Here is an example. It took me a while to figure this out, and I'm not sure why lme4 responds this way.

Cheers, Tony

bootMer_LRT <- function(mod, formula.r, re.form=NA, nAGQ=0, nboot=2000, verbose=F, data=NA) {

mod.f <- update(mod, nAGQ=nAGQ, control = glmerControl(calc.derivs=F))

mod.r <- update(mod, formula=formula.r, nAGQ=nAGQ, control = glmerControl(calc.derivs=F))

LLR.true <- logLik(mod.f) - logLik(mod.r)

dist <- data.frame(LLR = rep(0,nboot))

sim.data <- model.frame(mod.f)

for(j in 1:nboot){

   if(is.na(re.form)){

          simY <- simulate(mod.r)

   }else{

          simY <- simulate(mod.r, re.form=formula(re.form))

   }

   sim.data[,1] <- simY[,1]

   mod.f.boot <- update(mod.f, data=sim.data, start=list(theta=attributes(mod.f)$theta))

   mod.r.boot <- update(mod.r, data=sim.data, start=list(theta=attributes(mod.r)$theta))

   dist$LLR[j] <- logLik(mod.f.boot) - logLik(mod.r.boot)

   if(verbose==T) show(c(j, LLR.true, dist$LLR[j]))

}

P <- mean(LLR.true < dist$LLR)

return(list(mod.r=mod.r, dist.LLR=dist$LLR, P=P))

}

mod <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial) mod.r <- glmer(cbind(incidence, size - incidence) ~ 1 + (1 | herd), data = cbpp, family = binomial) bootMer_LRT(mod, mod.r)

But this works

cbpp$value <- cbind(cbpp$incidence, cbpp$size - cbpp$incidence) mod <- glmer(value ~ period + (1 | herd), data = cbpp, family = binomial) mod.r <- glmer(value ~ 1 + (1 | herd), data = cbpp, family = binomial) bootMer_LRT(mod, mod.r, nboot=20)

From: Daijiang Li [email protected] Reply-To: daijiang/phyr [email protected] Date: Tuesday, October 16, 2018 at 8:36 PM To: daijiang/phyr [email protected] Cc: "Anthony R. Ives" [email protected], Author [email protected] Subject: Re: [daijiang/phyr] communityPGLMM.bayes (#30)

Hi Tony, Can you post some code to show that it does not work? It seems work for me.

library(lme4)

gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),

          data = cbpp, family = binomial)

update(gm1,devFunOnly=TRUE)

simulate(gm1, nsim = 1)

Cheers, Daijiang

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/daijiang/phyr/issues/30#issuecomment-430458445, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALF_LM4KORUA9ovHhAErhV_O9EPtFSG2ks5ulomDgaJpZM4XXuhO.

Oct 17 '18 10:10 arives

Humm... I do not know. I think the formula is still cbind(success, fail) but the data frame from simulate() has the column name as "cbind(success, fail)" literally; then when update() trying to find success in the data frame, it cannot find it (because the name is "cbind(success, fail).success" now. This is not a problem for value because the name did not change after using simulate()...

Not know what to do about it yet.

Oct 17 '18 15:10 daijiang

Tony, A possible workaround for lme4:

library(lme4)

bootMer_LRT <- function(mod, formula.r = formula(mod.r), re.form=NA, nAGQ=0, nboot=20, verbose=F, data=NA) {
  mod.f <- update(mod, nAGQ=nAGQ, control = glmerControl(calc.derivs=F))
  mod.r <- update(mod, formula=formula.r, nAGQ=nAGQ, control = glmerControl(calc.derivs=F))
  LLR.true <- logLik(mod.f) - logLik(mod.r)
  dist <- data.frame(LLR = rep(0,nboot))
  for(j in 1:nboot){
    simY = simulate(mod.r)

    mod.f.boot <- lme4::refit(mod.f, newresp = simY) # not using update
    mod.r.boot <- lme4::refit(mod.r, newresp = simY)
    dist$LLR[j] <- logLik(mod.f.boot) - logLik(mod.r.boot)

    if(verbose==T) show(c(j, LLR.true, dist$LLR[j]))
  }
  P <- mean(LLR.true < dist$LLR)
  return(list(mod.r=mod.r, dist.LLR=dist$LLR, P=P))
}

cbpp$fail = cbpp$size - cbpp$incidence

mod <- glmer(cbind(incidence, fail) ~ period + (1 | herd),  data = cbpp, family = binomial)
mod.r <- glmer(cbind(incidence, fail) ~ 1 + (1 | herd), data = cbpp, family = binomial)
x1 = bootMer_LRT(mod, mod.r, verbose = T)

cbpp$value <- cbind(cbpp$incidence, cbpp$size - cbpp$incidence)
mod <- glmer(value ~ period + (1 | herd),  data = cbpp, family = binomial)
mod.r <- glmer(value ~ 1 + (1 | herd), data = cbpp, family = binomial)
x2 = bootMer_LRT(mod, mod.r, nboot=20, verbose = T)
x1
x2

Oct 17 '18 15:10 daijiang

Daijiang,

This is great. Thanks. I've updated the code for Jesse's ms with this. This is the type of things that drives me crazy! Thanks for figuring it out.

Cheers, Tony

Anthony R. Ives UW-Madison 459 Birge Hall 608-262-1519

From: Daijiang Li [email protected] Reply-To: daijiang/phyr [email protected] Date: Wednesday, October 17, 2018 at 10:59 AM To: daijiang/phyr [email protected] Cc: "Anthony R. Ives" [email protected], Author [email protected] Subject: Re: [daijiang/phyr] communityPGLMM.bayes (#30)

lme4::refit

Oct 17 '18 18:10 arives

phyr phyr copied to clipboard

communityPGLMM.bayes

phyr
phyr copied to clipboard