vegan
vegan copied to clipboard
PERMANOVA and the Behrens-Fisher F statistic
I am working with a data set with an unbalanced sampling design that violates the assumption of homogeneity of group dispersions. I was informed that one can modify the source code for the adonis() function to calculate a modified Behrens-Fisher F-statistic per Anderson et al. 2017 (doi: 10.1111/anzs.12176). Has anyone tried this? Is there a version of the source code already available that can accomplish this task? I was able to brute force calculate the BFS for a simple, one-variable PERMANOVA based on the mathematical descriptions provided in the article; however, my abilities with matrix algebra aren't super sophisticated, so it's been hard for me to wrap my head around how to calculate the BFS for a PERMANOVA with multiple variables and interaction effects. Any information the devs or the community might have about how to code this in R and/or modify the adonis() function would be super helpful. Thank you!
Thanks for the query. Fascinating / interesting paper. Maybe drop Anderson an email, if you have no luck here.
Much appreciate having it highlighted in the mail. Best wishes, Mark
On Mon, 10 Feb 2020, 18:34 Melanie, [email protected] wrote:
I am working with a data set with an unbalanced sampling design that violates the assumption of homogeneity of group dispersions. I was informed that one can modify the source code for the adonis() function to calculate a modified Behrens-Fisher F-statistic per Anderson et al. 2017 (doi: 10.1111/anzs.12176). Has anyone tried this? Is there a version of the source code already available that can accomplish this task? I was able to brute force calculate the BFS for a simple, one-variable PERMANOVA based on the mathematical descriptions provided in the article; however, my abilities with matrix algebra aren't super sophisticated, so it's been hard for me to wrap my head around how to calculate the BFS for a PERMANOVA with multiple variables and interaction effects. Any information the devs or the community might have about how to code this in R and/or modify the adonis() function would be super helpful. Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vegandevs/vegan/issues/344?email_source=notifications&email_token=AELPYXBZRXBEW345JJAN7ZTRCGM37A5CNFSM4KSTK6NKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IMK35VQ, or
You can examine the code for adonis here - https://github.com/jarioksa/vegan/blob/master/R/adonis.R
I agree that the solution provided in Anderson et al. 2017 should be implemented in the adonis function. It is currently provided in the PERMANOVA+ module for PRIMER-e.
Currently there is no R implementation of this solution that I am aware of.
🙏 any statistical gurus out there want to help out? 🧞♂️
If still relevant - I recently emailed PRIMER about this and got a reply from Anderson who said the adjusted F value in the 2017 paper is currently only available in the PRIMER software (as far as she is aware!). If anyone can implement this into the adonis function you're amazing!
@ecologyjh, it has also been implemented in the Fathom Toolbox in MATLAB (functions f_permanova and f_permanovaPW).
For a one-way ANOVA with just a standard categorical factor I believe this would calculate the modified F (I'm getting the same modified F as the Fathom Toolbox implementation). I took the beginning of the adonis
function and then added a block of code to calculate F.Mod
and for now commented out the standard way F.Mod
is calculated. That'd need to be added into where f.perm
is determined in adonis
for significance tests though and I'm not sure exactly how to do that. Also, I'm not sure how this would expand to cases more than just a standard one-way anova (though according to the paper, that is doable).
Modified_F<- function(formula, data,
permutations = 999,
method = "bray",
strata = NULL,
contr.unordered = "contr.sum",
contr.ordered = "contr.poly",
parallel = getOption("mc.cores")){
EPS <- sqrt(.Machine$double.eps)
TOL <- 1e-07
Terms <- terms(formula, data = data)
lhs <- formula[[2]]
lhs <- eval(lhs, data, parent.frame())
formula[[2]] <- NULL
rhs.frame <- model.frame(formula, data, drop.unused.levels = TRUE)
op.c <- options()$contrasts
options(contrasts = c(contr.unordered, contr.ordered))
rhs <- model.matrix(formula, rhs.frame)
options(contrasts = op.c)
grps <- attr(rhs, "assign")
qrhs <- qr(rhs)
rhs <- rhs[, qrhs$pivot, drop = FALSE]
rhs <- rhs[, 1:qrhs$rank, drop = FALSE]
grps <- grps[qrhs$pivot][1:qrhs$rank]
u.grps <- unique(grps)
nterms <- length(u.grps) - 1
if (nterms < 1)
stop("right-hand-side of formula has no usable terms")
H.s <- lapply(2:length(u.grps), function(j) {
Xj <- rhs[, grps %in% u.grps[1:j]]
qrX <- qr(Xj, tol = TOL)
Q <- qr.Q(qrX)
tcrossprod(Q[, 1:qrX$rank])
})
if (inherits(lhs, "dist")) {
if (any(lhs < -TOL)) {stop("dissimilarities must be non-negative")}
dmat <- as.matrix(lhs^2)
} else if ((is.matrix(lhs) || is.data.frame(lhs)) && isSymmetric(unname(as.matrix(lhs)))) {
dmat <- as.matrix(lhs^2)
lhs <- as.dist(lhs)
} else {
dist.lhs <- as.matrix(vegdist(lhs, method = method))
dmat <- dist.lhs^2
}
n <- nrow(dmat)
G <- -sweep(dmat, 1, rowMeans(dmat))/2
SS.Exp.comb <- sapply(H.s, function(hat) sum(G * t(hat)))
SS.Exp.each <- c(SS.Exp.comb - c(0, SS.Exp.comb[-nterms]))
H.snterm <- H.s[[nterms]]
tIH.snterm <- t(diag(n) - H.snterm)
if (length(H.s) > 1)
for (i in length(H.s):2) H.s[[i]] <- H.s[[i]] - H.s[[i -
1]]
SS.Res <- sum(G * tIH.snterm)
df.Exp <- sapply(u.grps[-1], function(i) sum(grps == i))
df.Res <- n - qrhs$rank
if (inherits(lhs, "dist")) {
beta.sites <- qr.coef(qrhs, as.matrix(lhs))
beta.spp <- NULL
} else {
beta.sites <- qr.coef(qrhs, dist.lhs)
beta.spp <- qr.coef(qrhs, as.matrix(lhs))
}
colnames(beta.spp) <- colnames(lhs)
colnames(beta.sites) <- rownames(lhs)
#F.Mod <- (SS.Exp.each/df.Exp)/(SS.Res/df.Res)
###########NEW CODE STARTS HERE########################################
grp_ID<- as.character(interaction(as.data.frame(rhs))) #Reference to group membership of each observation
n_per_group<- table(grp_ID) #Sample size of each group
VL<- rep(0, length = length(unique(grp_ID))) #Initialize
names(VL)<- names(n_per_group)
for (i in 1:(n-1)){
for (j in (i+1):n) {
d_ij2<- dmat[i,j]
grp_i<-grp_ID[i]
grp_j<- grp_ID[j]
if(grp_i==grp_j){
Eij_L<- 1
} else{
Eij_L<- 0
} #Indicator whether observations i and j are in the same group
nL<- n_per_group[grp_i] #number of observations in group L
VL[grp_i]<- VL[grp_i]+ (Eij_L*d_ij2/(nL*(nL-1))) #Within Group Dispersion for each group
}
}
F.Mod_denominator<- 0
for (i in 1:length(VL)) {
F.Mod_denominator<- F.Mod_denominator + ((1 - (n_per_group[i]/n))*VL[i])
}
names(F.Mod_denominator)<- NULL
F.Mod_numerator<- SS.Exp.comb #Among Group Sum of Squares is equal to tr(HG), SS.Exp.comb seems to be that
F.Mod<- F.Mod_numerator/F.Mod_denominator #Modified F statistic accounting for different dispersions
return(F.Mod)
###########NEW CODE ENDS HERE########################################
}
Example of Calculating modified F on the dune dataset
library(vegan)
data(dune)
data(dune.env)
Modified_F(dune~ Management, data = dune.env)
[1] 3.130212
For a one-way ANOVA with just a standard categorical factor I believe this would calculate the modified F (I'm getting the same modified F as the Fathom Toolbox implementation). I took the beginning of the
adonis
function and then added a block of code to calculateF.Mod
and for now commented out the standard wayF.Mod
is calculated. That'd need to be added into wheref.perm
is determined inadonis
for significance tests though and I'm not sure exactly how to do that. Also, I'm not sure how this would expand to cases more than just a standard one-way anova (though according to the paper, that is doable).Modified_F<- function(formula, data, permutations = 999, method = "bray", strata = NULL, contr.unordered = "contr.sum", contr.ordered = "contr.poly", parallel = getOption("mc.cores")){ EPS <- sqrt(.Machine$double.eps) TOL <- 1e-07 Terms <- terms(formula, data = data) lhs <- formula[[2]] lhs <- eval(lhs, data, parent.frame()) formula[[2]] <- NULL rhs.frame <- model.frame(formula, data, drop.unused.levels = TRUE) op.c <- options()$contrasts options(contrasts = c(contr.unordered, contr.ordered)) rhs <- model.matrix(formula, rhs.frame) options(contrasts = op.c) grps <- attr(rhs, "assign") qrhs <- qr(rhs) rhs <- rhs[, qrhs$pivot, drop = FALSE] rhs <- rhs[, 1:qrhs$rank, drop = FALSE] grps <- grps[qrhs$pivot][1:qrhs$rank] u.grps <- unique(grps) nterms <- length(u.grps) - 1 if (nterms < 1) stop("right-hand-side of formula has no usable terms") H.s <- lapply(2:length(u.grps), function(j) { Xj <- rhs[, grps %in% u.grps[1:j]] qrX <- qr(Xj, tol = TOL) Q <- qr.Q(qrX) tcrossprod(Q[, 1:qrX$rank]) }) if (inherits(lhs, "dist")) { if (any(lhs < -TOL)) {stop("dissimilarities must be non-negative")} dmat <- as.matrix(lhs^2) } else if ((is.matrix(lhs) || is.data.frame(lhs)) && isSymmetric(unname(as.matrix(lhs)))) { dmat <- as.matrix(lhs^2) lhs <- as.dist(lhs) } else { dist.lhs <- as.matrix(vegdist(lhs, method = method)) dmat <- dist.lhs^2 } n <- nrow(dmat) G <- -sweep(dmat, 1, rowMeans(dmat))/2 SS.Exp.comb <- sapply(H.s, function(hat) sum(G * t(hat))) SS.Exp.each <- c(SS.Exp.comb - c(0, SS.Exp.comb[-nterms])) H.snterm <- H.s[[nterms]] tIH.snterm <- t(diag(n) - H.snterm) if (length(H.s) > 1) for (i in length(H.s):2) H.s[[i]] <- H.s[[i]] - H.s[[i - 1]] SS.Res <- sum(G * tIH.snterm) df.Exp <- sapply(u.grps[-1], function(i) sum(grps == i)) df.Res <- n - qrhs$rank if (inherits(lhs, "dist")) { beta.sites <- qr.coef(qrhs, as.matrix(lhs)) beta.spp <- NULL } else { beta.sites <- qr.coef(qrhs, dist.lhs) beta.spp <- qr.coef(qrhs, as.matrix(lhs)) } colnames(beta.spp) <- colnames(lhs) colnames(beta.sites) <- rownames(lhs) #F.Mod <- (SS.Exp.each/df.Exp)/(SS.Res/df.Res) ###########NEW CODE STARTS HERE######################################## grp_ID<- as.character(interaction(as.data.frame(rhs))) #Reference to group membership of each observation n_per_group<- table(grp_ID) #Sample size of each group VL<- rep(0, length = length(unique(grp_ID))) #Initialize names(VL)<- names(n_per_group) for (i in 1:(n-1)){ for (j in (i+1):n) { d_ij2<- dmat[i,j] grp_i<-grp_ID[i] grp_j<- grp_ID[j] if(grp_i==grp_j){ Eij_L<- 1 } else{ Eij_L<- 0 } #Indicator whether observations i and j are in the same group nL<- n_per_group[grp_i] #number of observations in group L VL[grp_i]<- VL[grp_i]+ (Eij_L*d_ij2/(nL*(nL-1))) #Within Group Dispersion for each group } } F.Mod_denominator<- 0 for (i in 1:length(VL)) { F.Mod_denominator<- F.Mod_denominator + ((1 - (n_per_group[i]/n))*VL[i]) } names(F.Mod_denominator)<- NULL F.Mod_numerator<- SS.Exp.comb #Among Group Sum of Squares is equal to tr(HG), SS.Exp.comb seems to be that F.Mod<- F.Mod_numerator/F.Mod_denominator #Modified F statistic accounting for different dispersions return(F.Mod) ###########NEW CODE ENDS HERE######################################## }
Example of Calculating modified F on the dune dataset
library(vegan) data(dune) data(dune.env) Modified_F(dune~ Management, data = dune.env) [1] 3.130212
Hi, will I able to use this code for a 4-way PERMANOVA or just for one-way?
@pkmnsandy, this is unfortunately just for a one-way PERMANOVA. The Fathom implementation states that this is a "one-way (modified) PERMANOVA" and the original paper states "The F2 test statistic can be extended to allow for differences in dispersions of replicates within sites, and differences in dispersions of site centroids within regions, for relevant tests of individual factors at each spatial scale. Similar extensions can be formulated and derived for tests of individual terms in fixed, random or mixed multi-way ANOVA models including interactions. We shall leave these extensions (beyond the scope of the current contribution) for a future endeavour."