vegan icon indicating copy to clipboard operation
vegan copied to clipboard

PERMANOVA and the Behrens-Fisher F statistic

Open davismj87 opened this issue 5 years ago • 7 comments

I am working with a data set with an unbalanced sampling design that violates the assumption of homogeneity of group dispersions. I was informed that one can modify the source code for the adonis() function to calculate a modified Behrens-Fisher F-statistic per Anderson et al. 2017 (doi: 10.1111/anzs.12176). Has anyone tried this? Is there a version of the source code already available that can accomplish this task? I was able to brute force calculate the BFS for a simple, one-variable PERMANOVA based on the mathematical descriptions provided in the article; however, my abilities with matrix algebra aren't super sophisticated, so it's been hard for me to wrap my head around how to calculate the BFS for a PERMANOVA with multiple variables and interaction effects. Any information the devs or the community might have about how to code this in R and/or modify the adonis() function would be super helpful. Thank you!

davismj87 avatar Feb 10 '20 18:02 davismj87

Thanks for the query. Fascinating / interesting paper. Maybe drop Anderson an email, if you have no luck here.

Much appreciate having it highlighted in the mail. Best wishes, Mark

On Mon, 10 Feb 2020, 18:34 Melanie, [email protected] wrote:

I am working with a data set with an unbalanced sampling design that violates the assumption of homogeneity of group dispersions. I was informed that one can modify the source code for the adonis() function to calculate a modified Behrens-Fisher F-statistic per Anderson et al. 2017 (doi: 10.1111/anzs.12176). Has anyone tried this? Is there a version of the source code already available that can accomplish this task? I was able to brute force calculate the BFS for a simple, one-variable PERMANOVA based on the mathematical descriptions provided in the article; however, my abilities with matrix algebra aren't super sophisticated, so it's been hard for me to wrap my head around how to calculate the BFS for a PERMANOVA with multiple variables and interaction effects. Any information the devs or the community might have about how to code this in R and/or modify the adonis() function would be super helpful. Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vegandevs/vegan/issues/344?email_source=notifications&email_token=AELPYXBZRXBEW345JJAN7ZTRCGM37A5CNFSM4KSTK6NKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IMK35VQ, or

MarkReader avatar Feb 10 '20 23:02 MarkReader

You can examine the code for adonis here - https://github.com/jarioksa/vegan/blob/master/R/adonis.R

I agree that the solution provided in Anderson et al. 2017 should be implemented in the adonis function. It is currently provided in the PERMANOVA+ module for PRIMER-e.

Currently there is no R implementation of this solution that I am aware of.

🙏 any statistical gurus out there want to help out? 🧞‍♂️

leholman avatar Feb 19 '20 10:02 leholman

If still relevant - I recently emailed PRIMER about this and got a reply from Anderson who said the adjusted F value in the 2017 paper is currently only available in the PRIMER software (as far as she is aware!). If anyone can implement this into the adonis function you're amazing!

ecologyjh avatar Jun 29 '20 23:06 ecologyjh

@ecologyjh, it has also been implemented in the Fathom Toolbox in MATLAB (functions f_permanova and f_permanovaPW).

ailich avatar Nov 12 '20 19:11 ailich

For a one-way ANOVA with just a standard categorical factor I believe this would calculate the modified F (I'm getting the same modified F as the Fathom Toolbox implementation). I took the beginning of the adonis function and then added a block of code to calculate F.Mod and for now commented out the standard way F.Mod is calculated. That'd need to be added into where f.perm is determined in adonis for significance tests though and I'm not sure exactly how to do that. Also, I'm not sure how this would expand to cases more than just a standard one-way anova (though according to the paper, that is doable).

Modified_F<- function(formula, data,
permutations = 999,
method = "bray", 
strata = NULL,
contr.unordered = "contr.sum", 
contr.ordered = "contr.poly",
parallel = getOption("mc.cores")){
  EPS <- sqrt(.Machine$double.eps)
  TOL <- 1e-07
  Terms <- terms(formula, data = data)
  lhs <- formula[[2]]
  lhs <- eval(lhs, data, parent.frame())
  formula[[2]] <- NULL
  rhs.frame <- model.frame(formula, data, drop.unused.levels = TRUE)
  op.c <- options()$contrasts
  options(contrasts = c(contr.unordered, contr.ordered))
  rhs <- model.matrix(formula, rhs.frame)
  options(contrasts = op.c)
  grps <- attr(rhs, "assign")
  qrhs <- qr(rhs)
  rhs <- rhs[, qrhs$pivot, drop = FALSE]
  rhs <- rhs[, 1:qrhs$rank, drop = FALSE]
  grps <- grps[qrhs$pivot][1:qrhs$rank]
  u.grps <- unique(grps)
  nterms <- length(u.grps) - 1
  if (nterms < 1) 
    stop("right-hand-side of formula has no usable terms")
  H.s <- lapply(2:length(u.grps), function(j) {
    Xj <- rhs[, grps %in% u.grps[1:j]]
    qrX <- qr(Xj, tol = TOL)
    Q <- qr.Q(qrX)
    tcrossprod(Q[, 1:qrX$rank])
  })
  if (inherits(lhs, "dist")) {
    if (any(lhs < -TOL)) {stop("dissimilarities must be non-negative")}
    dmat <- as.matrix(lhs^2)
  } else if ((is.matrix(lhs) || is.data.frame(lhs)) && isSymmetric(unname(as.matrix(lhs)))) {
    dmat <- as.matrix(lhs^2)
    lhs <- as.dist(lhs)
  } else {
    dist.lhs <- as.matrix(vegdist(lhs, method = method))
    dmat <- dist.lhs^2
  }
  n <- nrow(dmat)
  G <- -sweep(dmat, 1, rowMeans(dmat))/2
  SS.Exp.comb <- sapply(H.s, function(hat) sum(G * t(hat)))
  SS.Exp.each <- c(SS.Exp.comb - c(0, SS.Exp.comb[-nterms]))
  H.snterm <- H.s[[nterms]]
  tIH.snterm <- t(diag(n) - H.snterm)
  if (length(H.s) > 1) 
    for (i in length(H.s):2) H.s[[i]] <- H.s[[i]] - H.s[[i - 
                                                           1]]
  SS.Res <- sum(G * tIH.snterm)
  df.Exp <- sapply(u.grps[-1], function(i) sum(grps == i))
  df.Res <- n - qrhs$rank
  if (inherits(lhs, "dist")) {
    beta.sites <- qr.coef(qrhs, as.matrix(lhs))
    beta.spp <- NULL
  } else {
    beta.sites <- qr.coef(qrhs, dist.lhs)
    beta.spp <- qr.coef(qrhs, as.matrix(lhs))
  }
  colnames(beta.spp) <- colnames(lhs)
  colnames(beta.sites) <- rownames(lhs)
  #F.Mod <- (SS.Exp.each/df.Exp)/(SS.Res/df.Res)
  
  ###########NEW CODE STARTS HERE########################################
  grp_ID<- as.character(interaction(as.data.frame(rhs))) #Reference to group membership of each observation
  n_per_group<- table(grp_ID) #Sample size of each group
  VL<- rep(0, length = length(unique(grp_ID))) #Initialize
  names(VL)<- names(n_per_group)
  for (i in 1:(n-1)){
    for (j in (i+1):n) {
      d_ij2<- dmat[i,j]
      grp_i<-grp_ID[i]
      grp_j<- grp_ID[j]
      if(grp_i==grp_j){
        Eij_L<- 1 
      } else{
        Eij_L<- 0
      } #Indicator whether observations i and j are in the same group
      nL<- n_per_group[grp_i] #number of observations in group L
      VL[grp_i]<- VL[grp_i]+ (Eij_L*d_ij2/(nL*(nL-1))) #Within Group Dispersion for each group
    }
  }
  
  F.Mod_denominator<- 0 
  for (i in 1:length(VL)) {
    F.Mod_denominator<- F.Mod_denominator + ((1 - (n_per_group[i]/n))*VL[i])
  }
  names(F.Mod_denominator)<- NULL
  F.Mod_numerator<- SS.Exp.comb #Among Group Sum of Squares is equal to tr(HG), SS.Exp.comb seems to be that
  F.Mod<- F.Mod_numerator/F.Mod_denominator #Modified F statistic accounting for different dispersions
  return(F.Mod)
  ###########NEW CODE ENDS HERE########################################
  }

Example of Calculating modified F on the dune dataset

library(vegan)
data(dune)
data(dune.env)
Modified_F(dune~ Management, data = dune.env)
[1] 3.130212

ailich avatar Jun 07 '21 22:06 ailich

For a one-way ANOVA with just a standard categorical factor I believe this would calculate the modified F (I'm getting the same modified F as the Fathom Toolbox implementation). I took the beginning of the adonis function and then added a block of code to calculate F.Mod and for now commented out the standard way F.Mod is calculated. That'd need to be added into where f.perm is determined in adonis for significance tests though and I'm not sure exactly how to do that. Also, I'm not sure how this would expand to cases more than just a standard one-way anova (though according to the paper, that is doable).

Modified_F<- function(formula, data,
permutations = 999,
method = "bray", 
strata = NULL,
contr.unordered = "contr.sum", 
contr.ordered = "contr.poly",
parallel = getOption("mc.cores")){
  EPS <- sqrt(.Machine$double.eps)
  TOL <- 1e-07
  Terms <- terms(formula, data = data)
  lhs <- formula[[2]]
  lhs <- eval(lhs, data, parent.frame())
  formula[[2]] <- NULL
  rhs.frame <- model.frame(formula, data, drop.unused.levels = TRUE)
  op.c <- options()$contrasts
  options(contrasts = c(contr.unordered, contr.ordered))
  rhs <- model.matrix(formula, rhs.frame)
  options(contrasts = op.c)
  grps <- attr(rhs, "assign")
  qrhs <- qr(rhs)
  rhs <- rhs[, qrhs$pivot, drop = FALSE]
  rhs <- rhs[, 1:qrhs$rank, drop = FALSE]
  grps <- grps[qrhs$pivot][1:qrhs$rank]
  u.grps <- unique(grps)
  nterms <- length(u.grps) - 1
  if (nterms < 1) 
    stop("right-hand-side of formula has no usable terms")
  H.s <- lapply(2:length(u.grps), function(j) {
    Xj <- rhs[, grps %in% u.grps[1:j]]
    qrX <- qr(Xj, tol = TOL)
    Q <- qr.Q(qrX)
    tcrossprod(Q[, 1:qrX$rank])
  })
  if (inherits(lhs, "dist")) {
    if (any(lhs < -TOL)) {stop("dissimilarities must be non-negative")}
    dmat <- as.matrix(lhs^2)
  } else if ((is.matrix(lhs) || is.data.frame(lhs)) && isSymmetric(unname(as.matrix(lhs)))) {
    dmat <- as.matrix(lhs^2)
    lhs <- as.dist(lhs)
  } else {
    dist.lhs <- as.matrix(vegdist(lhs, method = method))
    dmat <- dist.lhs^2
  }
  n <- nrow(dmat)
  G <- -sweep(dmat, 1, rowMeans(dmat))/2
  SS.Exp.comb <- sapply(H.s, function(hat) sum(G * t(hat)))
  SS.Exp.each <- c(SS.Exp.comb - c(0, SS.Exp.comb[-nterms]))
  H.snterm <- H.s[[nterms]]
  tIH.snterm <- t(diag(n) - H.snterm)
  if (length(H.s) > 1) 
    for (i in length(H.s):2) H.s[[i]] <- H.s[[i]] - H.s[[i - 
                                                           1]]
  SS.Res <- sum(G * tIH.snterm)
  df.Exp <- sapply(u.grps[-1], function(i) sum(grps == i))
  df.Res <- n - qrhs$rank
  if (inherits(lhs, "dist")) {
    beta.sites <- qr.coef(qrhs, as.matrix(lhs))
    beta.spp <- NULL
  } else {
    beta.sites <- qr.coef(qrhs, dist.lhs)
    beta.spp <- qr.coef(qrhs, as.matrix(lhs))
  }
  colnames(beta.spp) <- colnames(lhs)
  colnames(beta.sites) <- rownames(lhs)
  #F.Mod <- (SS.Exp.each/df.Exp)/(SS.Res/df.Res)
  
  ###########NEW CODE STARTS HERE########################################
  grp_ID<- as.character(interaction(as.data.frame(rhs))) #Reference to group membership of each observation
  n_per_group<- table(grp_ID) #Sample size of each group
  VL<- rep(0, length = length(unique(grp_ID))) #Initialize
  names(VL)<- names(n_per_group)
  for (i in 1:(n-1)){
    for (j in (i+1):n) {
      d_ij2<- dmat[i,j]
      grp_i<-grp_ID[i]
      grp_j<- grp_ID[j]
      if(grp_i==grp_j){
        Eij_L<- 1 
      } else{
        Eij_L<- 0
      } #Indicator whether observations i and j are in the same group
      nL<- n_per_group[grp_i] #number of observations in group L
      VL[grp_i]<- VL[grp_i]+ (Eij_L*d_ij2/(nL*(nL-1))) #Within Group Dispersion for each group
    }
  }
  
  F.Mod_denominator<- 0 
  for (i in 1:length(VL)) {
    F.Mod_denominator<- F.Mod_denominator + ((1 - (n_per_group[i]/n))*VL[i])
  }
  names(F.Mod_denominator)<- NULL
  F.Mod_numerator<- SS.Exp.comb #Among Group Sum of Squares is equal to tr(HG), SS.Exp.comb seems to be that
  F.Mod<- F.Mod_numerator/F.Mod_denominator #Modified F statistic accounting for different dispersions
  return(F.Mod)
  ###########NEW CODE ENDS HERE########################################
  }

Example of Calculating modified F on the dune dataset

library(vegan)
data(dune)
data(dune.env)
Modified_F(dune~ Management, data = dune.env)
[1] 3.130212

Hi, will I able to use this code for a 4-way PERMANOVA or just for one-way?

pkmnsandy avatar Jul 23 '22 10:07 pkmnsandy

@pkmnsandy, this is unfortunately just for a one-way PERMANOVA. The Fathom implementation states that this is a "one-way (modified) PERMANOVA" and the original paper states "The F2 test statistic can be extended to allow for differences in dispersions of replicates within sites, and differences in dispersions of site centroids within regions, for relevant tests of individual factors at each spatial scale. Similar extensions can be formulated and derived for tests of individual terms in fixed, random or mixed multi-way ANOVA models including interactions. We shall leave these extensions (beyond the scope of the current contribution) for a future endeavour."

ailich avatar Jul 23 '22 20:07 ailich