F1000_workflow
F1000_workflow copied to clipboard
How do you define contrasts in the hierarchical multiple testing?
Hi-
Within DESeq you normally apply some sort of contrast with the results function i.e. (from: https://www.bioconductor.org/packages/release/bioc/vignettes/phyloseq/inst/doc/phyloseq-mixture-models.html): ddsTEST <- phyloseq_to_deseq2(ps.glom, ~ SeasonLoc) ddsTEST = estimateSizeFactors(ddsTEST, geoMeans=geoMeans) ddsTEST <- DESeq(ddsTEST, fitType="parametric") resT = results(ddsTEST, contrast=c("SeasonLoc", "PermDry", "PermWet"))
Which then will give you log2fold changes and adjusted p-values between your defined groups (i.e. "PermDry" compared to "PermWet" in this example).
I understand with the hierarchical multiple testing it's accounting for higher level structure in the data based on phylogeny, but I don't see how I define which groups the procedure compares. In the workflow, which age bins is it finding have different Lachnospiraceae abundance as there are 3 bins (0-100, 100-200, 200-400)? Is there any sort of directionality you can get from the output (increasing/decreasing abundance)?
Let me know if I'm being unclear or you need more info.
Thanks for any help, Ben
Ben I think your question is about the structSSI component of the test which allows you to test different levels of taxonomy (ie is there a difference at the phyla level, is there a difference at the genus level for the phyla that are significantly different..etc).
Reading the complete article about testing with hierarchical structure https://www.jstatsoft.org/article/view/v059i13
When doing testing for more than two groups, the only conclusion in a first round of tests is: there is a difference, the extra testing is done afterwards but looking at boxplots of the significant taxa gives the direction.
Susan
On Tue, Jun 6, 2017 at 9:12 AM, benkraj [email protected] wrote:
Hi-
Within DESeq you normally apply some sort of contrast with the results function i.e.: ddsTEST <- phyloseq_to_deseq2(ps.glom, ~ SeasonLoc) resT = results(ddsTEST, contrast=c("SeasonLoc", "PermDry", "PermWet"))
Which then will give you log2fold changes and adjusted p-values between your defined groups (i.e. "PermDry" compared to "PermWet" in this example).
I understand with the hierarchical multiple testing it's accounting for higher level structure in the data based on phylogeny, but I don't see how I define which groups the procedure compares. In the workflow, which age bins is it finding have different Lachnospiraceae abundance as there are 3 bins (0-100, 100-200, 200-400)? Is there any sort of directionality you can get from the output (increasing/decreasing abundance)?
Let me know if I'm being unclear or you need more info.
Thanks for any help, Ben
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvXVzsTI5kHCr4JcX2zvwLJCWY9R0ks5sBXp9gaJpZM4Nxk1V .
-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/
Hi Susan-
Thanks for your response.
Yes I am after the structSSI portion of the analysis. I have read through that paper, and I think maybe I follow now. Is it best to define your group comparison (or rounds of group comparisons) using the treePValues command's "environments"?
i.e. First round (comparing everything):
data("chlamydiae", package = "structSSI")
environments <- sample_data(chlamydiae)$SampleType
abundances <- otu_table(chlamydiae)
chl.tree <- get.edgelist(as.igraph(phy_tree(chlamydiae)))
chl.pval <- treePValues(chl.tree, abundances, environments)
chl.hfdr <- hFDR.adjust(chl.pval, chl.tree, alpha = 0.1)
Then say I want to compare only "Soil" vs "Skin" samples, I would just define a new environment, i.e.:
enviro2 <- as.factor(c("Soil", "Skin"))
chl.pval2 <- treePValues(chl.tree, abundances, enviro2)
chl.hfdr2 <- hFDR.adjust(chl.pval2, chl.tree, alpha = 0.1)
This would then be analogous to the contrasts in DESeq? It seems like it works, and then I wouldn't need to do any sort of phyloseq modification, i.e. subsetting the phyloseq to just my Soil and Skin samples.
In regards to the boxplots/ways to get fold change direction, if I understand correctly, I then could use basically what I had been before with DESeq to get the fold change information, but replace the p/adjusted p-values it calculates with the hierarchical multiple testing adjusted ones? Are there alternative helper functions I could use for this?
Does this seem to be correct in the approach?
Thanks, Ben
Ben Your approach is the best one, sorry not many helpers as everyone tailors their use of the tools to a different workflow so not much help having a completely general tool, but seems like your approach is good. If you are testing many different subsets for the same data, you may have to cost yourself a little for multiple testing too. Susan
On Wed, Jun 7, 2017 at 1:24 PM, benkraj [email protected] wrote:
Hi Susan-
Thanks for your response.
Yes I am after the structSSI portion of the analysis. I have read through that paper, and I think maybe I follow now. Is it best to define your group comparison (or rounds of group comparisons) using the treePValues command's "environments"?
i.e. First round (comparing everything):
data("chlamydiae", package = "structSSI") environments <- sample_data(chlamydiae)$SampleType abundances <- otu_table(chlamydiae) chl.tree <- get.edgelist(as.igraph(phy_tree(chlamydiae))) chl.pval <- treePValues(chl.tree, abundances, environments) chl.hfdr <- hFDR.adjust(chl.pval, chl.tree, alpha = 0.1)
Then say I want to compare only "Soil" vs "Skin" samples, I would just define a new environment, i.e.:
enviro2 <- as.factor(c("Soil", "Skin")) chl.pval2 <- treePValues(chl.tree, abundances, enviro2) chl.hfdr2 <- hFDR.adjust(chl.pval2, chl.tree, alpha = 0.1)
This would then be analogous to the contrasts in DESeq? It seems like it works, and then I wouldn't need to do any sort of phyloseq modification, i.e. subsetting the phyloseq to just my Soil and Skin samples.
In regards to the boxplots/ways to get fold change direction, if I understand correctly, I then could use basically what I had been before with DESeq to get the fold change information, but replace the p/adjusted p-values it calculates with the hierarchical multiple testing adjusted ones? Are there alternative helper functions I could use for this?
Does this seem to be correct in the approach?
Thanks, Ben
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/18#issuecomment-306913859, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvYl4TsK4JqiK4RKpNuXAl4idPvpYks5sBwbhgaJpZM4Nxk1V .
-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/