metacoder
metacoder copied to clipboard
How to filter non-significant odd named taxa, and only keep the significant odd named taxa?
Hi there!
I've been using metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
to remove odd taxa, but some of the odd named taxa are significant and I would like them to be displayed on the tree.
Is there a way to only display the significant odd named taxa?
What do you mean by significant? Can you give me an example? You can make a list of taxa you want to be displayed no matter what and do this:
metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$") | taxon_names %in% my_taxon_name_list, reassign_obs = FALSE)
Statistical signifiance after correcting for multiple comparisons. This is what I did:
create a new column called wilcox_p_value_p.adjusted to correct for multiple comparison
obj$data$diff_table$wilcox_p_value_p.adjusted <- p.adjust(obj$data$diff_table$wilcox_p_value,
method = "fdr")
create a new column in diff_table containing log2_median ratio, then mutate this to remove values where wilcox.p.adjusted value is not significant, first create this new column with identical values
obj$data$diff_table$log2_median_ratio_wilcox.adjust <- obj$data$diff_table$log2_median_ratio
then mutate this new column to remove non-signif values
obj$data$diff_table$log2_median_ratio_wilcox.adjust[obj$data$diff_table$wilcox_p_value_p.adjusted > 0.05] <- 0
Then I created the tree to only display significant taxa after correcting for multiple comparisons at the genus level
set.seed(1)
obj %>%
metacoder::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = FALSE) %>%
metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
heat_tree_matrix(
data = "diff_table",
node_size = n_obs,
node_label = taxon_names,
node_color = log2_median_ratio_wilcox.adjust,
node_color_range = diverging_palette(),
node_color_trans = "linear",
node_color_interval = c(-8, 8),
edge_color_interval = c(-8, 8),
node_size_axis_label = "Number of OTUs",
node_color_axis_label = "Log2 ratio median proportions",
layout = "davidson-harel",
initial_layout = "reingold-tilford",
output_file = "diff tree.pdf")
Let me know if I am doing anything wrong
Ok, I understand now. Thanks for the code! I see that you set the non-significant taxa to 0 but I dont see where you are filtering them out. Either way, if you want to remove and taxa with odd names that are not significant you can do something like:
metacoder::filter_taxa(! (wilcox_p_value_p.adjusted > 0.05 & ! grepl(taxon_names, pattern = "^[a-zA-Z]+$")), reassign_obs = FALSE)
Thanks for that, but unfortunately I get this error when I replace
metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
with
metacoder::filter_taxa(! (wilcox_p_value_p.adjusted > 0.05 & ! grepl(taxon_names, pattern = "^[a-zA-Z]+$")), reassign_obs = FALSE)
Error: TRUE/FALSE vector (length = 1452) must be the same length as the number of taxa (242)
Oh did I do something wrong? I thought I did filter them out by having this line:
obj$data$diff_table$log2_median_ratio_wilcox.adjust[obj$data$diff_table$wilcox_p_value_p.adjusted > 0.05] <- 0
as it would filter the non signif ones after mutating and by choosing it to be displayed in the node_colour section?
Somehow it looked like it was filtered out in my tree when I did this
set.seed(1)
obj %>%
metacoder::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = FALSE) %>%
metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
heat_tree_matrix(
data = "diff_table",
node_size = n_obs,
node_label = taxon_names,
node_color = log2_median_ratio_wilcox.adjust,
node_color_range = diverging_palette(),
node_color_trans = "linear",
node_color_interval = c(-8, 8),
edge_color_interval = c(-8, 8),
node_size_axis_label = "Number of OTUs",
node_color_axis_label = "Log2 ratio median proportions",
layout = "davidson-harel",
initial_layout = "reingold-tilford",
output_file = "diff tree.pdf")
Can you send me an example data set with associated code that reproduces the issue? Its hard for me to debug without reproducing the error.
Sorry dumb question, but how do I send an example data?
My original data file is huge as it's a qza file from QIIME2 analysis and I'm not sure what I need to do to it.
No problem, its a common question.
If you can reproduce the error with a subset of the data, you can attach it to this issue to upload them. You can save the needed R objects to a file with readRDS
at the point before the example code starts. You can also email the original data at [email protected]
if you dont want it public and its small enough to email.
Thanks, I just emailed it to you! I'm not sure if I did it correctly