phyloseq icon indicating copy to clipboard operation
phyloseq copied to clipboard

Transform to phyloseq objects (Error in validObject(.Object) : invalid class “phyloseq” object: Component sample names do not match. Try sample_names())

Open Aladin1342 opened this issue 4 years ago • 10 comments

Dear all,

I am new in working with Phyloseq. When I was exploring and trying to apply the phyloseq tutorial exapmle created by Daniel Vaulot (https://github.com/vaulot/R_tutorials/archive/master.zip) I faced an issue in the step of phyloseq objects transformation. This is the message I get every time I try to achieve it :

Transform to phyloseq objects

  OTU = otu_table(otu_mat, taxa_are_rows = TRUE)
  TAX = tax_table(tax_mat)
  samples = sample_data(samples_df)
  
  carbom <- phyloseq(OTU, TAX, samples)
  carbom

Error in validObject(.Object) : invalid class “phyloseq” object: Component sample names do not match. Try sample_names() *

From what I understand, there is no match between sample_names and row.names or Rstudio failed to create the objects: OTU, TAX and samples.

Could you please help me out with this! I am struggling for 6 days in order to find the solution.

Thank you.

Aladin

P.S. I am using the tutorial's data without any changes.

Aladin1342 avatar Jul 13 '20 22:07 Aladin1342

I have exactly the same problem as Aladin, also with my other DADA2 results...is there something wrong with the package?

Misemi9 avatar Oct 27 '20 16:10 Misemi9

Was there a solution to this? I'm running into the same problem as well when running the tutorial.

jfoldi81 avatar Dec 21 '20 04:12 jfoldi81

For my data I solved the problem by changing the format of the import file to .csv and defining ";" as separator. Before I imported the meta data as text file. But I haven't tried this for the Vaulot-Tutorial.

Misemi9 avatar Jan 06 '21 14:01 Misemi9

For my data I solved the problem by changing the format of the import file to .csv and defining ";" as separator. Before I imported the meta data as text file. But I haven't tried this for the Vaulot-Tutorial.

Thanks Misemi9, this solved the problem!

amsyarimorni avatar Feb 01 '21 12:02 amsyarimorni

I think the row.names needs to be assigned after using sample_data I followed the link below to solve the problem. https://github.com/joey711/phyloseq/issues/1020

tumu0901 avatar Feb 24 '21 02:02 tumu0901

Dear friends

I am sorry for bringing this topic again. I am new to Rstudio I have the same issue and can't solve it I have tried everything you have mentioned, but still the same mistake

I run

otu <- read.table(file = "feature-table.tsv", sep = "\t", header = T, row.names = 2, skip = 1, comment.char = "") taxonomy <- read.table(file = "taxonomy.tsv", sep = "\t", header = T ,row.names = 1)

clean the taxonomy, Greengenes format

tax <- taxonomy %>% select(Taxon) %>% separate(Taxon, c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"), "; ")

tax.clean <- data.frame(row.names = row.names(tax), Kingdom = str_replace(tax[,1], "k__",""), Phylum = str_replace(tax[,2], "p__",""), Class = str_replace(tax[,3], "c__",""), Order = str_replace(tax[,4], "o__",""), Family = str_replace(tax[,5], "f__",""), Genus = str_replace(tax[,6], "g__",""), Species = str_replace(tax[,7], "s__",""), stringsAsFactors = FALSE)

tax.clean[is.na(tax.clean)] <- "" tax.clean[tax.clean=="__"] <- ""

for (i in 1:nrow(tax.clean)){ if (tax.clean[i,7] != ""){ tax.clean$Species[i] <- paste(tax.clean$Genus[i], tax.clean$Species[i], sep = " ") } else if (tax.clean[i,2] == ""){ kingdom <- paste("Unclassified", tax.clean[i,1], sep = " ") tax.clean[i, 2:7] <- kingdom } else if (tax.clean[i,3] == ""){ phylum <- paste("Unclassified", tax.clean[i,2], sep = " ") tax.clean[i, 3:7] <- phylum } else if (tax.clean[i,4] == ""){ class <- paste("Unclassified", tax.clean[i,3], sep = " ") tax.clean[i, 4:7] <- class } else if (tax.clean[i,5] == ""){ order <- paste("Unclassified", tax.clean[i,4], sep = " ") tax.clean[i, 5:7] <- order } else if (tax.clean[i,6] == ""){ family <- paste("Unclassified", tax.clean[i,5], sep = " ") tax.clean[i, 6:7] <- family } else if (tax.clean[i,7] == ""){ tax.clean$Species[i] <- paste("Unclassified ",tax.clean$Genus[i], sep = " ") } }

metadata <- read.table(file = "sample-metadata.tsv", sep = "\t", header = T, row.names = 1) OTU = otu_table(as.matrix(otu), taxa_are_rows = TRUE) TAX = tax_table(as.matrix(tax.clean)) SAMPLE <- sample_data(metadata) TREE = read_tree("tree.nwk")

merge the data

ps <- phyloseq(OTU, TAX, SAMPLE,TREE)

and got

Error in validObject(.Object) : invalid class “phyloseq” object: Component sample names do not match. Try sample_names()

Maybe because the first name of the sample(1rA) is cut by the program And I can't figure out how to keep it (pic-Rstudio)

There are some screenshots Samples name not in order, they mixed up, I have 72 samples

I would appreciate any help

Screenshot Rstudio ScreenshotMetadata Screenshot FeatureTable

ShevchenkoAlla avatar Dec 29 '21 19:12 ShevchenkoAlla

@ShevchenkoAlla The error you are getting is probably that the sample names in the otu table and metadata files are not matching.

hildaha avatar Jan 10 '22 16:01 hildaha

@ShevchenkoAlla The error you are getting is probably that the sample names in the otu table and metadata files are not matching.

Thank you) But as I mentioned before the sample names are not in order in otu table, I don't know why qiime2 made them that way, they just mixed up.

Everything should be matched

Screenshot from 2022-01-10 23-27-50 t way)

ShevchenkoAlla avatar Jan 10 '22 20:01 ShevchenkoAlla

Hi, I had the same issue.

Error in validObject(.Object) : invalid class “phyloseq” object: Component sample names do not match. Try sample_names()

For me, the issue was solved by specifying what column my row names are in, so phyloseq knows what to match the otu_table to. It was mentioned in one of the comments above. When you are importing your metadata (CSV-file), you need to specify seperators and row.names (in my case, first column, so row.names = 1).

This is the whole code I used to create a phyloseq object (that worked for me):

#Based on http://benjjneb.github.io/dada2/tutorial.html "Handsoff phyloseq"

Aim: Create a phyloseq object with all data for further analysis

Load required packages

library(phyloseq) library(Biostrings) library(ggplot2) library(readxl) theme_set(theme_bw())

Load taxonomic assignment and otu table (rds files need to be imported to R and stored as an object)

taxa <- readRDS("/Users/admin/HPC/12-07-23/taxonomy_table.rds") seqtab.nochim <- readRDS("/Users/admin/HPC/12-07-23/sequence_table.rds")

Load the metadata (CSV file) into a data frame

metadata <- read.csv("/Users/admin/metadata/metadata_microbiome_27.09.23.csv", sep = ";", row.names = 1)

Create the phyloseq object

ps <- phyloseq( otu_table(seqtab.nochim, taxa_are_rows = FALSE), sample_data(metadata), # Use the 'sample_names' column from the metadata data frame tax_table(taxa) # Include taxonomic information if available )

...

klassLG avatar Sep 27 '23 12:09 klassLG

Hi all,

I am currently facing an issue while working with the microbiome package in R and would greatly appreciate your insights.

# I will copy code from terminal here

> b.lgg <- divergence(subset_samples(physeq, Description == "Stool_controls"),
+                     apply(abundances(subset_samples(physeq, Description == "Stool_controls")), 1, median))
> b.pla <- divergence(subset_samples(physeq, Description == "Stool_samples"),
+                     apply(abundances(subset_samples(physeq, Description == "Stool_samples")), 1, median))
Error in validObject(.Object) : invalid class “phyloseq” object: 
 Component sample names do not match.
 Try sample_names()


 > sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Athens
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] jeevanuDB_0.0.01    RColorBrewer_1.1-3  lubridate_1.9.3    
 [4] forcats_1.0.0       stringr_1.5.1       dplyr_1.1.4        
 [7] purrr_1.0.2         readr_2.1.4         tidyr_1.3.0        
[10] tibble_3.2.1        tidyverse_2.0.0     knitr_1.45         
[13] microbiome_1.24.0   ggplot2_3.4.4       devtools_2.4.5     
[16] usethis_2.2.2       BiocManager_1.30.22 phyloseq_1.46.0    

loaded via a namespace (and not attached):
 [1] bitops_1.0-7            remotes_2.4.2.1        
 [3] permute_0.9-7           rlang_1.1.2            
 [5] magrittr_2.0.3          ade4_1.7-22            
 [7] compiler_4.3.1          mgcv_1.9-0             
 [9] vctrs_0.6.5             reshape2_1.4.4         
[11] profvis_0.3.8           pkgconfig_2.0.3        
[13] crayon_1.5.2            fastmap_1.1.1          
[15] XVector_0.42.0          ellipsis_0.3.2         
[17] caTools_1.18.2          utf8_1.2.4             
[19] promises_1.2.1          tzdb_0.4.0             
[21] sessioninfo_1.2.2       xfun_0.41              
[23] zlibbioc_1.48.0         cachem_1.0.8           
[25] GenomeInfoDb_1.38.1     jsonlite_1.8.8         
[27] biomformat_1.30.0       later_1.3.2            
[29] rhdf5filters_1.14.1     Rhdf5lib_1.24.1        
[31] parallel_4.3.1          cluster_2.1.6          
[33] R6_2.5.1                stringi_1.8.3          
[35] pkgload_1.3.3           Rcpp_1.0.11            
[37] iterators_1.0.14        IRanges_2.36.0         
[39] timechange_0.2.0        httpuv_1.6.13          
[41] Matrix_1.6-4            splines_4.3.1          
[43] igraph_1.6.0            tidyselect_1.2.0       
[45] rstudioapi_0.15.0       vegan_2.6-4            
[47] gplots_3.1.3            codetools_0.2-19       
[49] miniUI_0.1.1.1          pkgbuild_1.4.3         
[51] lattice_0.22-5          plyr_1.8.9             
[53] Biobase_2.62.0          shiny_1.8.0            
[55] withr_2.5.2             Rtsne_0.17             
[57] survival_3.5-7          urlchecker_1.0.1       
[59] Biostrings_2.70.1       pillar_1.9.0           
[61] KernSmooth_2.23-22      foreach_1.5.2          
[63] stats4_4.3.1            generics_0.1.3         
[65] RCurl_1.98-1.13         hms_1.1.3              
[67] S4Vectors_0.40.2        munsell_0.5.0          
[69] scales_1.3.0            gtools_3.9.5           
[71] xtable_1.8-4            glue_1.6.2             
[73] tools_4.3.1             data.table_1.14.10     
[75] fs_1.6.3                rhdf5_2.46.1           
[77] grid_4.3.1              ape_5.7-1              
[79] colorspace_2.1-0        nlme_3.1-164           
[81] GenomeInfoDbData_1.2.11 cli_3.6.2              
[83] fansi_1.0.6             gtable_0.3.4           
[85] digest_0.6.33           BiocGenerics_0.48.1    
[87] htmlwidgets_1.6.4       memoise_2.0.1          
[89] htmltools_0.5.7         multtest_2.58.0        
[91] lifecycle_1.0.4         mime_0.12              
[93] MASS_7.3-60            

The code showed above attempts to calculate divergence for subsets of my phyloseq object based on sample descriptions ("Stool_controls" and "Stool_samples").

I have verified that the sample names within each subset match using:

# I will copy code from terminal here

> rownames(sample_data(physeq))
 [1] "1"   "2"   "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11" 
[12] "12"  "13"  "14"  "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22" 
[23] "23"  "24"  "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33" 
[34] "34"  "35"  "36"  "37"  "38"  "39"  "40"  "41"  "42"  "43"  "44" 
[45] "45"  "46"  "47"  "48"  "49"  "C1"  "C2"  "C4"  "C5"  "C6"  "C7" 
[56] "C8"  "C9"  "C10" "C11" "C12" "C13" "C14" "C15" "C16" "C17" "C18"
[67] "C19" "C20" "C21" "C22" "C23" "C24" "C25" "C26" "C27" "C28" "C29"
[78] "C30" "C31" "C32" "C33" "C34" "C35" "C36" "C37" "C38" "C39" "C40"
[89] "C41" "C42" "C43" "C44" "C45" "C46" "C47"
> sample_names(physeq) 
 [1] "1"   "2"   "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11" 
[12] "12"  "13"  "14"  "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22" 
[23] "23"  "24"  "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33" 
[34] "34"  "35"  "36"  "37"  "38"  "39"  "40"  "41"  "42"  "43"  "44" 
[45] "45"  "46"  "47"  "48"  "49"  "C1"  "C2"  "C4"  "C5"  "C6"  "C7" 
[56] "C8"  "C9"  "C10" "C11" "C12" "C13" "C14" "C15" "C16" "C17" "C18"
[67] "C19" "C20" "C21" "C22" "C23" "C24" "C25" "C26" "C27" "C28" "C29"
[78] "C30" "C31" "C32" "C33" "C34" "C35" "C36" "C37" "C38" "C39" "C40"
[89] "C41" "C42" "C43" "C44" "C45" "C46" "C47"

If anyone has encountered a similar issue or has insights into why this might be happening, I would greatly appreciate your help.

If there are alternative approaches to calculate divergence for specific sample subsets in phyloseq, I am open to suggestions.

Thank you in advance for your time and assistance.

kbenmd avatar Dec 16 '23 19:12 kbenmd