ANCOM-BC-Code-Archive
ANCOM-BC-Code-Archive copied to clipboard
Archive: Data, scripts, and outputs for the Nat. Comm. paper "Analysis of Composition of Microbiomes with Bias Correction". Please check our ANCOMBC R package for the most up-to-date ANCOM-BC function...
User Manual for ANCOM-BC Function
This is the repository archiving data and scripts for reproducing results presented in the Nat. Comm. paper ANCOM-BC.
For the corresponding R package, refer to ANCOMBC repository.
The current code implements ANCOM-BC in cross-sectional datasets for comparing the change of absolute abundance for each taxon among different experimental groups.
R-package dependencies
The following libraries need to be included for the R code to run:
library(dplyr)
library(nloptr)
Instructions for use
Data preprocess
Usage
-
feature_table_pre_process(feature.table, meta.data, sample.var, group.var, zero.cut, lib.cut, neg.lb)
Arguments
-
feature.table
: Data frame or matrix representing observed OTU table with OTUs (or taxa) in rows and samples in columns. -
meta.data
: Data frame or matrix of all variables and covariates of interest. -
sample.var
: Character. The name of column storing sample IDs. -
group.var
: Character. The name of the main variable of interest. ANCOM-BC v1.0 only supports discretegroup.var
and aims to compare the change of absolute abundance across different levels ofgroup.var
. -
zero.cut
: Numerical fraction between 0 and 1. Taxa with proportion of zeroes greater thanzero.cut
are not included in the analysis. -
lib.cut
: Numeric. Samples with library size less thanlib.cut
are not included in the analysis. -
neg.lb
: Logical. TRUE indicates a taxon would be classified as a structural zero in the corresponding experimental group using its asymptotic lower bound.
Value
-
feature.table
: A data frame of pre-processed OTU table. -
library.size
: A numeric vector of library sizes after pre-processing. -
group.name
: A character vector of levels ofgroup.var
. -
group.ind
: A numeric vector. Each sample is assigned to a number indicating its group label for better internal process. -
structure.zeros
: A matrix consists of 0 and 1s with 1 indicating the taxon is identified as a structural zero in the corresponding group.
ANCOM-BC main function
Usage:
-
ANCOM_BC(feature.table, grp.name, grp.ind, struc.zero, adj.method, tol.EM, max.iterNum, perNum, alpha)
Arguments:
-
feature.table
: Data frame or matrix representing the pre-processed OTU table with OTUs (or taxa) in rows and samples in columns. -
grp.name
: A character vector indicating the levels of group. -
grp.ind
: A numeric vector indicating group assignment for each sample. 1 corresponds to the 1st level ofgrp.name
, 2 corresponds to the 2nd level ofgrp.name
, etc. -
struc.zero
: A matrix consists of 0 and 1s with 1 indicating the taxon is identified as a structural zero in the corresponding group. -
adj.method
: Character. Returns p-values adjusted using the specified method, including"holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
. -
tol.EM
: Numeric. The iteration convergence tolerance for E-M algorithm. -
max.iterNum
: Numeric. The maximum number of iterations for E-M algorithm. -
perNum
: Numeric. The maximum number of permutations. This argument is active only if there exist more than 2 groups. -
alpha
: Numeric. Level of significance.
Value:
-
feature.table
: Data frame or matrix. Return the inputfeature.table
. -
res
: Data frame. The primary result of ANCOM-BC consisting of:-
mean.difference
: Numeric. The estimated mean difference of absolute abundance between groups in log scale (natural log); -
se
: Numeric. The standard error ofmean.difference
; -
W
: Numeric.mean.difference/se
, which is the test statistic of ANCOM-BC. -
p.val
: Numeric. P-value obtained from two-sided Z-test using the test statisticW
. -
q.val
. Numeric. Q-value obtained by applyingadj.method
top-val
. -
diff.abn
. Logical. TRUE if the taxon hasq.val
less thanalpha
.
-
-
d
: A numeric vector. Estimated sampling fractions in log scale (natural log). -
mu
: A numeric vector. Estimated log (natural log) mean absolute abundance for each group. -
bias.em
: Numeric. Estimated mean difference of log (natural log) sampling fractions between groups through E-M algorithm. -
bias.wls
: Numeric. Estimated mean difference of log (natural log) sampling fractions between groups through weighted least squares.
Flowchart of ANCOM-BC

Examples
# Load example data
data(dietswap)
pseq = dietswap
n_taxa = ntaxa(pseq)
n_samp = nsamples(pseq)
# Metadata
meta_data = meta(pseq)
# Taxonomy table
taxonomy = tax_table(pseq)
# Absolute abundances
otu_absolute = abundances(pseq)
Two-group comparison
# Pre-processing
feature.table = otu_absolute; sample.var = "sample"; group.var = "nationality";
zero.cut = 0.90; lib.cut = 1000; neg.lb = TRUE
pre.process = feature_table_pre_process(feature.table, meta_data, sample.var,
group.var, zero.cut, lib.cut, neg.lb)
feature.table = pre.process$feature.table
group.name = pre.process$group.name
group.ind = pre.process$group.ind
struc.zero = pre.process$structure.zeros
# Paras for ANCOM-BC
grp.name = group.name; grp.ind = group.ind; adj.method = "bonferroni"
tol.EM = 1e-5; max.iterNum = 100; perNum = 1000; alpha = 0.05
out = ANCOM_BC(feature.table, grp.name, grp.ind, struc.zero,
adj.method, tol.EM, max.iterNum, perNum, alpha)
res = cbind(taxon = rownames(out$feature.table), out$res)
write_csv(res, "demo_two_group.csv")
Expected run time: 6s (R version 3.5.1 (2018-07-02); Platform: x86_64-apple-darwin15.6.0 (64-bit); Running under: macOS 10.15.1.)
Multi-group comparison
# Pre-processing
feature.table = otu_absolute; sample.var = "sample"; group.var = "bmi_group";
zero.cut = 0.90; lib.cut = 1000; neg.lb = TRUE
pre.process = feature_table_pre_process(feature.table, meta_data, sample.var,
group.var, zero.cut, lib.cut, neg.lb)
feature.table = pre.process$feature.table
group.name = pre.process$group.name
group.ind = pre.process$group.ind
struc.zero = pre.process$structure.zeros
# Paras for ANCOM-BC
grp.name = group.name; grp.ind = group.ind; adj.method = "bonferroni"
tol.EM = 1e-5; max.iterNum = 100; perNum = 1000; alpha = 0.05
out = ANCOM_BC(feature.table, grp.name, grp.ind, struc.zero,
adj.method, tol.EM, max.iterNum, perNum, alpha)
res = cbind(taxon = rownames(out$feature.table), out$res)
write_csv(res, "demo_multi_group.csv")
Expected run time: 19s (R version 3.5.1 (2018-07-02); Platform: x86_64-apple-darwin15.6.0 (64-bit); Running under: macOS 10.15.1.)