metacoder icon indicating copy to clipboard operation
metacoder copied to clipboard

otu_table, tax_data and sample_data only contains NA after parsing from phyloseq

Open FFoQS90 opened this issue 6 years ago • 12 comments

Hi,

I am trying to parse my phyloseq object to a taxmap object but I cant get it to work properly. The parsing seems to work fine (No error message) but when I look at the taxmap object it looks like this:

1284 taxa: aab. Bacteria ... bxk. TACGTAGGGGGCGA[truncated] 1284 edges: NA->aab, NA->aac, aab->aad, aab->aae, aab->aaf ... anv->bxh, anu->bxi, axg->bxj, anu->bxk 4 data sets: otu_table:

NA NA NANANA NANANA NANANA tax_data:

NA NA NANANA NANANA NANANA sample_data:

NA NA phy_tree:

  Phylogenetic tree with 680 tips and 678 internal nodes.
  
  Tip labels:
  	ASV1, ASV2, ASV3, ASV4, ASV5, ASV6, ...
  Node labels:
  	, 0.872, 0.889, 0.000, 0.737, 0.857, ...
  
  Unrooted; includes branch lengths.

0 functions:

Any idea whats causing this and how to resolve it?

FFoQS90 avatar Oct 11 '19 00:10 FFoQS90

Hi @FFoQS90,

That error looks familiar, but I cant remember what causes it. Can you send my your phyloseq object (or another that causes the same problem) using the save function in R? You can email me at zacharyfoster1989 gmail.com if you don't want to attach it here. Thanks

zachary-foster avatar Oct 11 '19 00:10 zachary-foster

Sorry for the delay. I cant reproduce that problem:

library(metacoder)
#> Loading required package: taxa
#> This is metacoder verison 0.3.3 (stable)
x = readRDS('~/Downloads/pobj.rds')
parse_phyloseq(x)
#> Loading required package: phyloseq
#> 
#> Attaching package: 'phyloseq'
#> The following object is masked from 'package:taxa':
#> 
#>     filter_taxa
#> <Taxmap>
#>   1284 taxa: aab. Bacteria ... bxk. TACGTAGGGGGCGA[truncated]
#>   1284 edges: NA->aab, NA->aac ... axg->bxj, anu->bxk
#>   4 data sets:
#>     otu_table:
#>       # A tibble: 680 x 52
#>         taxon_id otu_id   `1`  `10`  `11`  `12`  `13`  `14`  `15`
#>         <chr>    <chr>  <int> <int> <int> <int> <int> <int> <int>
#>       1 axh      ASV1      23    19    28    40    20    77    94
#>       2 axi      ASV2       5     4     1     2     0    17     2
#>       3 axj      ASV3       5     6     0    19     5     6    18
#>       # … with 677 more rows, and 43 more variables: `16` <int>,
#>       #   `17` <int>, `18` <int>, `19` <int>, `2` <int>,
#>       #   `20` <int>, `21` <int>, `22` <int>, `23` <int>,
#>       #   `24` <int>, …
#>     tax_data:
#>       # A tibble: 680 x 10
#>         taxon_id otu_id Kingdom Phylum Class Order Family Genus
#>         <chr>    <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>
#>       1 axh      ASV1   Bacter… Firmi… Baci… Baci… Bacil… Virg…
#>       2 axi      ASV2   Bacter… Firmi… Baci… Baci… Bacil… Virg…
#>       3 axj      ASV3   Bacter… Firmi… Baci… Baci… Bacil… Orni…
#>       # … with 677 more rows, and 2 more variables:
#>       #   Species <chr>, ASV <chr>
#>     sample_data:
#>       # A tibble: 50 x 28
#>         sample_id Sample.ID Treatment Days  Strain Combo Moisture
#>         <chr>     <chr>     <chr>     <chr> <chr>  <chr> <chr>   
#>       1 1         1         PLE       0     SH.28… PLE.… 0.182   
#>       2 10        10        PLE       14    SH.28… PLE.… 0.142   
#>       3 11        11        PLE       14    SH.28… PLE.… 0.165   
#>       # … with 47 more rows, and 21 more variables:
#>       #   Salmonella <chr>, pH <chr>, Aw <chr>, TOC <chr>,
#>       #   TC <chr>, IC <chr>, TN <chr>, NH4 <chr>, NO3 <chr>,
#>       #   Al <chr>, …
#>     phy_tree:
#>       
#>       Phylogenetic tree with 680 tips and 678 internal nodes.
#>       
#>       Tip labels:
#>          ASV1, ASV2, ASV3, ASV4, ASV5, ASV6, ...
#>       Node labels:
#>          , 0.872, 0.889, 0.000, 0.737, 0.857, ...
#>       
#>       Unrooted; includes branch lengths.
#>   0 functions:

Created on 2019-10-15 by the reprex package (v0.3.0)

Can you reinstall metacoder and phyloseq and try again? Also, your sessionInfo()might be useful. Thanks!

zachary-foster avatar Oct 15 '19 20:10 zachary-foster

Thank you for looking into this. I reinstalled phyloseq and metacoder still the same problem. Here is my sessionInfo:

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_AT.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_AT.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_AT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] metacoder_0.3.3 taxa_0.3.2      phyloseq_1.28.0 dada2_1.12.1    Rcpp_1.0.2     

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1            hwriter_1.3.2               htmlTable_1.13.2            XVector_0.24.0             
  [5] GenomicRanges_1.36.1        base64enc_0.1-3             rstudioapi_0.10             farver_1.1.0               
  [9] remotes_2.1.0               ggrepel_0.8.1               ggnet_0.1.0                 bit64_0.9-7                
 [13] AnnotationDbi_1.46.1        lubridate_1.7.4             codetools_0.2-16            splines_3.6.1              
 [17] doParallel_1.0.15           geneplotter_1.62.0          knitr_1.25                  polyclip_1.10-0            
 [21] zeallot_0.1.0               ade4_1.7-13                 Formula_1.2-3               jsonlite_1.6               
 [25] Rsamtools_2.0.3             annotate_1.62.0             cluster_2.1.0               ggforce_0.3.1              
 [29] compiler_3.6.1              httr_1.4.1                  backports_1.1.5             assertthat_0.2.1           
 [33] Matrix_1.2-17               lazyeval_0.2.2              tsnemicrobiota_0.1.0        cli_1.1.0                  
 [37] tweenr_1.0.1                acepack_1.4.1               htmltools_0.4.0             tools_3.6.1                
 [41] igraph_1.2.4.1              gtable_0.3.0                glue_1.3.1                  GenomeInfoDbData_1.2.1     
 [45] reshape2_1.4.3              dplyr_0.8.3                 ShortRead_1.42.0            Biobase_2.44.0             
 [49] vctrs_0.2.0                 Biostrings_2.52.0           multtest_2.40.0             ape_5.3                    
 [53] nlme_3.1-141                iterators_1.0.12            xfun_0.10                   stringr_1.4.0              
 [57] network_1.15                lifecycle_0.1.0             XML_3.98-1.20               zlibbioc_1.30.0            
 [61] MASS_7.3-51.4               scales_1.0.0                parallel_3.6.1              SummarizedExperiment_1.14.1
 [65] biomformat_1.12.0           rhdf5_2.28.1                RColorBrewer_1.1-2          memoise_1.1.0              
 [69] gridExtra_2.3               ggplot2_3.2.1               rpart_4.1-15                latticeExtra_0.6-28        
 [73] stringi_1.4.3               RSQLite_2.1.2               genefilter_1.66.0           S4Vectors_0.22.1           
 [77] foreach_1.4.7               checkmate_1.9.4             permute_0.9-5               BiocGenerics_0.30.0        
 [81] BiocParallel_1.18.1         GenomeInfoDb_1.20.0         rlang_0.4.0                 pkgconfig_2.0.3            
 [85] matrixStats_0.55.0          bitops_1.0-6                lattice_0.20-38             purrr_0.3.2                
 [89] Rhdf5lib_1.6.2              GenomicAlignments_1.20.1    htmlwidgets_1.5.1           cowplot_1.0.0              
 [93] bit_1.1-14                  tidyselect_0.2.5            plyr_1.8.4                  magrittr_1.5               
 [97] DESeq2_1.24.0               R6_2.4.0                    IRanges_2.18.3              Hmisc_4.2-0                
[101] DelayedArray_0.10.0         DBI_1.0.0                   pillar_1.4.2                foreign_0.8-72             
[105] mgcv_1.8-29                 survival_2.44-1.1           RCurl_1.95-4.12             nnet_7.3-12                
[109] tibble_2.1.3                crayon_1.3.4                plotly_4.9.0                locfit_1.5-9.1             
[113] grid_3.6.1                  data.table_1.12.4           blob_1.2.0                  vegan_2.5-6                
[117] digest_0.6.21               xtable_1.8-4                tidyr_1.0.0                 RcppParallel_4.4.4         
[121] stats4_3.6.1                munsell_0.5.0               viridisLite_0.3.0           ampvis2_2.5.1

Thanks!

FFoQS90 avatar Oct 15 '19 21:10 FFoQS90

Hmm, that all looks fine.

The only other time I have seen that error was in a workshop where people were loading phyloseq/taxmap objects from .RData files, but Im not sure that is the cause.

Can you send me the code you used to get to this point?

Unfortunately, I need to reproduce the bug before I can fix it.

zachary-foster avatar Oct 16 '19 18:10 zachary-foster

I just tried it again in a brandnew Rproject with no other packages loaded and it worked. I guess one of the attached or loaded packages in my original session caused the problem. Here is the code with sessionInfo that worked for me. Maybe this helps in tracking down the bug.

Thanks for your help.

> p <- readRDS("path_to/pobj.rds")
Loading required package: phyloseq
> library(metacoder)
Loading required package: taxa

Attaching package: ‘taxa’

The following object is masked from ‘package:phyloseq’:

    filter_taxa

This is metacoder verison 0.3.3 (stable)
> metaphyseq <- parse_phyloseq(p)
> metaphyseq
<Taxmap>
  1284 taxa: aab. Bacteria ... bxk. TACGTAGGGGGCGA[truncated]
  1284 edges: NA->aab, NA->aac, aab->aad, aab->aae, aab->aaf ... anw->bxg, anv->bxh, anu->bxi, axg->bxj, anu->bxk
  4 data sets:
    otu_table:
      # A tibble: 680 x 52
        taxon_id otu_id   `1`  `10`  `11`  `12`  `13`  `14`  `15`  `16`  `17`  `18`  `19`   `2`  `20`  `21`  `22`
        <chr>    <chr>  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
      1 axh      ASV1      23    19    28    40    20    77    94    45    22    21    35    30    37    99    47
      2 axi      ASV2       5     4     1     2     0    17     2     2     0     0     0    23     0     0     0
      3 axj      ASV3       5     6     0    19     5     6    18     1     0     7     4     1     3     4     0
      # … with 677 more rows, and 35 more variables: `23` <int>, `24` <int>, `25` <int>, `26` <int>, `27` <int>,
      #   `28` <int>, `29` <int>, `3` <int>, `30` <int>, `31` <int>, …
    tax_data:
      # A tibble: 680 x 10
        taxon_id otu_id Kingdom  Phylum   Class  Order  Family  Genus    Species     ASV                             
        <chr>    <chr>  <chr>    <chr>    <chr>  <chr>  <chr>   <chr>    <chr>       <chr>                           
      1 axh      ASV1   Bacteria Firmicu… Bacil… Bacil… Bacill… Virgiba… Genus_Virg… TACGTAGGGGGCAAGCGTTGTCCGGAATTAT…
      2 axi      ASV2   Bacteria Firmicu… Bacil… Bacil… Bacill… Virgiba… Genus_Virg… TACGTAGGGGGCGAGCGTTGTCCGGAATTAT…
      3 axj      ASV3   Bacteria Firmicu… Bacil… Bacil… Bacill… Ornithi… Genus_Orni… TACGTAGGGGGCAAGCGTTGTCCGGAATTAT…
      # … with 677 more rows
    sample_data:
      # A tibble: 50 x 28
        sample_id Sample.ID Treatment Days  Strain Combo Moisture Salmonella pH    Aw    TOC   TC    IC    TN   
        <chr>     <chr>     <chr>     <chr> <chr>  <chr> <chr>    <chr>      <chr> <chr> <chr> <chr> <chr> <chr>
      1 1         1         PLE       0     SH.28… PLE.… 0.182    6.055      8.06  0.824 2983… 3091… 1076… 7424…
      2 10        10        PLE       14    SH.28… PLE.… 0.142    2.455      7.91  0.64  2082… 2168… 8623… 5547…
      3 11        11        PLE       14    SH.28… PLE.… 0.165    2.428      7.91  0.646 2511… 2601… 9021… 5820…
      # … with 47 more rows, and 14 more variables: NH4 <chr>, NO3 <chr>, Al <chr>, Ca <chr>, Cd <chr>, Cu <chr>,
      #   Fe <chr>, Mg <chr>, Mn <chr>, P <chr>, …
    phy_tree:
      
      Phylogenetic tree with 680 tips and 678 internal nodes.
      
      Tip labels:
      	ASV1, ASV2, ASV3, ASV4, ASV5, ASV6, ...
      Node labels:
      	, 0.872, 0.889, 0.000, 0.737, 0.857, ...
      
      Unrooted; includes branch lengths.
  0 functions:
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_AT.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_AT.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_AT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] metacoder_0.3.3 taxa_0.3.2      phyloseq_1.28.0

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5    reshape2_1.4.3      purrr_0.3.2         splines_3.6.1       lattice_0.20-38     rhdf5_2.28.1       
 [7] colorspace_1.4-1    vctrs_0.2.0         stats4_3.6.1        mgcv_1.8-29         utf8_1.1.4          survival_2.44-1.1  
[13] rlang_0.4.0         pillar_1.4.2        glue_1.3.1          BiocGenerics_0.30.0 foreach_1.4.7       plyr_1.8.4         
[19] stringr_1.4.0       zlibbioc_1.30.0     Biostrings_2.52.0   munsell_0.5.0       gtable_0.3.0        codetools_0.2-16   
[25] Biobase_2.44.0      permute_0.9-5       IRanges_2.18.3      biomformat_1.12.0   parallel_3.6.1      fansi_0.4.0        
[31] Rcpp_1.0.2          scales_1.0.0        backports_1.1.5     vegan_2.5-6         S4Vectors_0.22.1    jsonlite_1.6       
[37] XVector_0.24.0      ggplot2_3.2.1       stringi_1.4.3       dplyr_0.8.3         grid_3.6.1          ade4_1.7-13        
[43] cli_1.1.0           tools_3.6.1         magrittr_1.5        lazyeval_0.2.2      tibble_2.1.3        cluster_2.1.0      
[49] crayon_1.3.4        ape_5.3             pkgconfig_2.0.3     zeallot_0.1.0       MASS_7.3-51.4       Matrix_1.2-17      
[55] data.table_1.12.4   assertthat_0.2.1    rstudioapi_0.10     iterators_1.0.12    Rhdf5lib_1.6.2      R6_2.4.0           
[61] multtest_2.40.0     igraph_1.2.4.1      nlme_3.1-141        compiler_3.6.1

FFoQS90 avatar Oct 16 '19 18:10 FFoQS90

Hi! I noticed that in the example above you have 1284 taxa, but only 680 ASVs. Any idea of why there are more taxa than ASVs? I am running into the same problem with my dataset. Thanks!

anaeiou avatar Apr 09 '21 19:04 anaeiou

hello @anaeiou,

The number of taxa and ASVs can be different. All ASVs could be assigned to a single taxon for example, or a single ASV could have a deep classification. For example, an ASV with the classification:

"Root;Actinobacteria;Actinobacteria;Actinomycetales;Nocardiaceae;Rhodococcus"

would be assigned to 6 taxa.

1284 taxa and 680 ASVs does not indicate a problem. I hope this clears things up some!

zachary-foster avatar Apr 12 '21 19:04 zachary-foster

Thanks for the clarification! That makes sense- I was confused because in your tutorial (awesome resource, thank you so much for it!) you have 1,000 OTUs and they translate into 174 taxa even though each one has deep classification. Do you know why this might be? Either way, good to know I am (probably) not doing anything wrong :)

anaeiou avatar Apr 12 '21 21:04 anaeiou

No problem! It depends on the data set and the taxonomy used. In the case of the example data:

library(metacoder)
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")
heat_tree(x, node_label = paste0(taxon_names, ' (', n_obs, ')'), node_size = n_obs, node_color = n_obs)

Created on 2021-04-12 by the reprex package (v0.3.0)

Note that many of the genera have lots of OTUs assigned to them (e.g., Bacterodies with 115).

zachary-foster avatar Apr 12 '21 21:04 zachary-foster

Got it! Thanks again

anaeiou avatar Apr 12 '21 21:04 anaeiou

Going back to the issue of displaying NAs:

I've been having this problem too. Unfortunately, I'm not able to reproduce it using reprex. However, I've figured out that this is only a problem for me when displaying taxmap objects within an R notebook but not in the console. Maybe this will be helpful for future debugging.

image

levlitichev avatar Feb 10 '22 20:02 levlitichev

Yea, that looks like thats due to how Rmarkdown is rendered in Rstudio. Since that is an interaction with Rstudio, I am not sure how to fix it, but I will keep on the ook out for a solution. You can avoid that issue if you change this setting:

image

Which will cause the output to be printed to the console.

zachary-foster avatar Feb 11 '22 22:02 zachary-foster