phyloseq
phyloseq copied to clipboard
estimate_richness fails -- Component taxa/OTU names do not match.
I know this issue has come up a bunch (e.g., here: https://github.com/joey711/phyloseq/issues/1018 and https://github.com/joey711/phyloseq/issues/552 - but I noticed its closed but with many people asking the same question, so I thought I'd ask again).
This code worked a couple years ago, but now I'm getting the following error:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'otu_table': invalid class “phyloseq” object:
Component taxa/OTU names do not match.
Taxa indices are critical to analysis.
Try taxa_names()
(code provided below; ordination methods work fine)
I've implemented several of the changes mentioned across the posts with a similar error message but I cannot seem to plot or estimate richness.
I have update all packages. I have used subset_samples instead of prune_samples. I've used as.integer ... I would appreciate any advice.
I have saved the phyloseq objects ps and ps1 so that they could be inspected.
cheers. and thanks in advance!
# Read in OTU table
otu_table_in <- read.table("QIIME2_preprocessing//biom/otu_table_new.txt", sep = "\t", row.names = 1, header = TRUE)
otu_table_in <- as.matrix(otu_table_in)
# Read in taxonomy
# Separated by kingdom, phylum, class, order, family, genus, species
taxonomy <- read.table("QIIME2_preprocessing//biom/taxonomy.tsv", header=T, sep='\t')
taxonomy<- separate(taxonomy, Taxon, c("Kingdom","Phylum","Class","Order", "Family", "Genus", "Species"), sep= ";", remove=TRUE)
## make sure to order things so that tax_names match for both OTU table and taxonomy table ##
taxonomy <- taxonomy[match(rownames(otu_table_in), taxonomy$Feature.ID), ]
rownames(taxonomy) <- taxonomy$Feature.ID
taxonomy <- taxonomy %>% select(-Feature.ID)
taxonomy <- as.matrix(taxonomy)
## check rownames
all(rownames(otu_table_in) == rownames(taxonomy))
#TRUE
# Read in metadata
metadata <- read.table("QIIME2_preprocessing//biom/metadata.txt", row.names = 1, header=T)
# Create a variable to separate experiments
metadata$experiment <- ifelse(is.na(metadata$FOOD_2E), 0, 1) #1 denotes samples that are in expt: FOOD_2E
# Read in tree
phy_tree <- read_tree("QIIME2_preprocessing//biom/tree.nwk")
# Import all as phyloseq objects
OTU <- otu_table(otu_table_in, taxa_are_rows = TRUE)
TAX <- tax_table(taxonomy)
META <- sample_data(metadata)
# Sanity checks for consistent OTU names
head(taxa_names(TAX), 4)
head(taxa_names(OTU), 4)
# Same sample names
sample_names(OTU)
sample_names(META)
sample_names(OTU) <- sample_names(META) # double check this has worked
# Finally merge
ps <- phyloseq(OTU, TAX, META, phy_tree)
keep_samples <- rownames(subset(metadata, metadata$experiment==1))
ps1 <- prune_samples(sample_names(ps) %in% keep_samples, ps)
ps1 = filter_taxa(ps1, function(x) sum(x > 5) > (0.05 * length(x)), TRUE)
ps1 = transform_sample_counts(ps1, function(x) x / sum(x))
## Estimate richness
ps1.rich <-estimate_richness(transform_sample_counts(ps1, as.integer),
measures=c("Chao1", "Shannon"))
gets me the following error:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'otu_table': invalid class “phyloseq” object:
Component taxa/OTU names do not match.
Taxa indices are critical to analysis.
Try taxa_names()
I have double checked that taxa and OTU names match:
> all(rownames(otu_table_in) == rownames(taxonomy))
[1] TRUE
> # Sanity checks for consistent OTU names
> head(taxa_names(TAX), 4)
[1] "4612c0d0098e033627c181ad74b0ac38" "06e2fddd55a126874c77aa6ebacb97f8"
[3] "ffbdce3deba8ab3b7dca9b31ad06dc21" "860fbf7bf96d9afcbe5eebf96800389d"
> head(taxa_names(OTU), 4)
[1] "4612c0d0098e033627c181ad74b0ac38" "06e2fddd55a126874c77aa6ebacb97f8"
[3] "ffbdce3deba8ab3b7dca9b31ad06dc21" "860fbf7bf96d9afcbe5eebf96800389d"
> ps1
phyloseq-class experiment-level object
otu_table() OTU Table: [ 517 taxa and 71 samples ]
sample_data() Sample Data: [ 71 samples by 11 sample variables ]
tax_table() Taxonomy Table: [ 517 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 517 tips and 514 internal nodes ]
> setdiff(taxa_names(otu_table(ps1)), taxa_names(tax_table(ps1)))
character(0)
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] vegan_2.6-4 lattice_0.21-8 permute_0.9-7 knitr_1.43 lubridate_1.9.2
[6] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4
[11] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0 phyloseq_1.38.0
loaded via a namespace (and not attached):
[1] nlme_3.1-162 bitops_1.0-7 matrixStats_0.63.0
[4] bit64_4.0.5 RColorBrewer_1.1-3 httr_1.4.6
[7] GenomeInfoDb_1.30.1 tools_4.1.2 utf8_1.2.3
[10] R6_2.5.1 DBI_1.1.3 BiocGenerics_0.40.0
[13] mgcv_1.8-42 colorspace_2.1-0 rhdf5filters_1.6.0
[16] ade4_1.7-22 withr_2.5.0 tidyselect_1.2.0
[19] DESeq2_1.34.0 bit_4.0.5 compiler_4.1.2
[22] cli_3.6.1 Biobase_2.54.0 DelayedArray_0.20.0
[25] scales_1.2.1 genefilter_1.76.0 digest_0.6.31
[28] rmarkdown_2.21 XVector_0.34.0 pkgconfig_2.0.3
[31] htmltools_0.5.5 MatrixGenerics_1.6.0 highr_0.10
[34] fastmap_1.1.1 rlang_1.1.1 rstudioapi_0.14
[37] RSQLite_2.3.1 generics_0.1.3 jsonlite_1.8.4
[40] BiocParallel_1.28.3 RCurl_1.98-1.12 magrittr_2.0.3
[43] GenomeInfoDbData_1.2.7 biomformat_1.22.0 Matrix_1.5-1
[46] Rcpp_1.0.10 munsell_0.5.0 S4Vectors_0.32.4
[49] Rhdf5lib_1.16.0 fansi_1.0.4 ape_5.7-1
[52] lifecycle_1.0.3 stringi_1.7.12 yaml_2.3.7
[55] MASS_7.3-60 SummarizedExperiment_1.24.0 zlibbioc_1.40.0
[58] rhdf5_2.38.1 plyr_1.8.8 grid_4.1.2
[61] blob_1.2.4 parallel_4.1.2 crayon_1.5.2
[64] Biostrings_2.62.0 splines_4.1.2 multtest_2.50.0
[67] annotate_1.72.0 hms_1.1.3 KEGGREST_1.34.0
[70] locfit_1.5-9.7 pillar_1.9.0 igraph_1.4.2
[73] GenomicRanges_1.46.1 geneplotter_1.72.0 reshape2_1.4.4
[76] codetools_0.2-19 stats4_4.1.2 XML_3.99-0.14
[79] glue_1.6.2 evaluate_0.21 data.table_1.14.8
[82] tzdb_0.4.0 vctrs_0.6.2 png_0.1-8
[85] foreach_1.5.2 gtable_0.3.3 cachem_1.0.8
[88] xfun_0.39 xtable_1.8-4 survival_3.5-5
[91] iterators_1.0.14 AnnotationDbi_1.56.2 memoise_2.0.1
[94] IRanges_2.28.0 cluster_2.1.4 timechange_0.2.0
/
I guess the issue might be normalisation here -- where as.integer is failing.
Is it best to stick to raw counts for alpha diversity measures? Would the same be true for beta diversity (ordination methods)?
Hey, so I have had the same issue. Try running just shannon and it should work. I also have normalised data and shannon and simpon work, but Chao1 does not.
I hope this helps somewhat.