metacoder icon indicating copy to clipboard operation
metacoder copied to clipboard

Export p-values with heircachy amd taxanomic names?

Open catherineel opened this issue 3 years ago • 2 comments

Hi there!

I was wondering, is there a way to export the stats generated from the heat trees? Based on your tutorial we can get the p-values for the heat tree using:

obj$data$diff_table$wilcox_p_value <- p.adjust(obj$data$diff_table$wilcox_p_value,
                                               method = "fdr")

obj$data$diff_table <- compare_groups(obj, dataset = "tax_abund",
                                      cols = hmp_samples$sample_id, # What columns of sample data to use
                                      groups = hmp_samples$body_site) # What category each sample is assigned to
print(obj$data$diff_table)

When I tried:

write.xlsx(obj$data$diff_table, "obj$data$diff_table_padj.xlsx")

I get the taxon_id as 3 letter codes and I can't tell which taxa they belong to and their names. Examples: aab aac aad aae aaf aag aah aai aaj

Is there a way to generate and export an excel file that has the taxonomic name with its p-values?

catherineel avatar Jan 20 '22 02:01 catherineel

Has anyone figured out how to do this yet?

tboonf avatar Apr 06 '23 11:04 tboonf

The taxon names can be accessed using the taxon_names() function. In functions that expect column names, like get_data_frame, functions like taxon_names() can be referenced and their output treated like a variable. For example:

library(metacoder)
library(dplyr)

x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Convert counts to proportions
x$data$otu_table <- calc_obs_props(x, data = "tax_data", cols = hmp_samples$sample_id)
#> Calculating proportions from counts for 50 columns for 1000 observations.

# Get per-taxon counts
x$data$tax_table <- calc_taxon_abund(x, data = "otu_table", cols = hmp_samples$sample_id)
#> Summing per-taxon counts from 50 columns for 174 taxa

# Calculate difference between groups
x$data$diff_table <- compare_groups(x, data = "tax_table",
                                    cols = hmp_samples$sample_id,
                                    groups = hmp_samples$body_site)

# Make table of per-taxon data
per_tax_data <- get_data_frame(x, name = c("taxon_ids", "taxon_names", "taxon_ranks", "classifications"))

# Combine with per-comparison data
comp_data <- left_join(x$data$diff_table, per_tax_data, by = c("taxon_id" = "taxon_ids"))

print(comp_data)
#> # A tibble: 1,740 × 10
#>    taxon_id treatme…¹ treat…² log2_…³ media…⁴ mean_d…⁵ wilcox_…⁶ taxon…⁷ taxon…⁸
#>    <chr>    <chr>     <chr>     <dbl>   <dbl>    <dbl>     <dbl> <chr>   <chr>  
#>  1 ab       Nose      Saliva     0     0       0       NaN       Root    r      
#>  2 ac       Nose      Saliva    -2.52 -0.170  -1.27e-1   6.84e-3 Proteo… p      
#>  3 ad       Nose      Saliva    -5.42 -0.267  -2.62e-1   1.08e-5 Bacter… p      
#>  4 ae       Nose      Saliva     5.12  0.597   5.81e-1   1.08e-5 Actino… p      
#>  5 af       Nose      Saliva    -1.01 -0.225  -1.47e-1   5.24e-2 Firmic… p      
#>  6 ag       Nose      Saliva  -Inf    -0.0249 -4.48e-2   1.49e-4 Fusoba… p      
#>  7 ah       Nose      Saliva     0     0      -1.02e-4   7.79e-2 Teneri… p      
#>  8 ai       Nose      Saliva     0     0       0       NaN       Spiroc… p      
#>  9 aj       Nose      Saliva    -2.65 -0.0954 -7.73e-2   5.20e-3 Gammap… c      
#> 10 ak       Nose      Saliva    -6.41 -0.0192 -1.96e-2   1.26e-3 Flavob… c      
#> # … with 1,730 more rows, 1 more variable: classifications <chr>, and
#> #   abbreviated variable names ¹​treatment_1, ²​treatment_2, ³​log2_median_ratio,
#> #   ⁴​median_diff, ⁵​mean_diff, ⁶​wilcox_p_value, ⁷​taxon_names, ⁸​taxon_ranks

Created on 2023-04-06 with reprex v2.0.2

zachary-foster avatar Apr 06 '23 17:04 zachary-foster