F1000_workflow icon indicating copy to clipboard operation
F1000_workflow copied to clipboard

How do you define contrasts in the hierarchical multiple testing?

Open benkraj opened this issue 7 years ago • 3 comments

Hi-

Within DESeq you normally apply some sort of contrast with the results function i.e. (from: https://www.bioconductor.org/packages/release/bioc/vignettes/phyloseq/inst/doc/phyloseq-mixture-models.html): ddsTEST <- phyloseq_to_deseq2(ps.glom, ~ SeasonLoc) ddsTEST = estimateSizeFactors(ddsTEST, geoMeans=geoMeans) ddsTEST <- DESeq(ddsTEST, fitType="parametric") resT = results(ddsTEST, contrast=c("SeasonLoc", "PermDry", "PermWet"))

Which then will give you log2fold changes and adjusted p-values between your defined groups (i.e. "PermDry" compared to "PermWet" in this example).

I understand with the hierarchical multiple testing it's accounting for higher level structure in the data based on phylogeny, but I don't see how I define which groups the procedure compares. In the workflow, which age bins is it finding have different Lachnospiraceae abundance as there are 3 bins (0-100, 100-200, 200-400)? Is there any sort of directionality you can get from the output (increasing/decreasing abundance)?

Let me know if I'm being unclear or you need more info.

Thanks for any help, Ben

benkraj avatar Jun 06 '17 16:06 benkraj

Ben I think your question is about the structSSI component of the test which allows you to test different levels of taxonomy (ie is there a difference at the phyla level, is there a difference at the genus level for the phyla that are significantly different..etc).

Reading the complete article about testing with hierarchical structure https://www.jstatsoft.org/article/view/v059i13

When doing testing for more than two groups, the only conclusion in a first round of tests is: there is a difference, the extra testing is done afterwards but looking at boxplots of the significant taxa gives the direction.

Susan

On Tue, Jun 6, 2017 at 9:12 AM, benkraj [email protected] wrote:

Hi-

Within DESeq you normally apply some sort of contrast with the results function i.e.: ddsTEST <- phyloseq_to_deseq2(ps.glom, ~ SeasonLoc) resT = results(ddsTEST, contrast=c("SeasonLoc", "PermDry", "PermWet"))

Which then will give you log2fold changes and adjusted p-values between your defined groups (i.e. "PermDry" compared to "PermWet" in this example).

I understand with the hierarchical multiple testing it's accounting for higher level structure in the data based on phylogeny, but I don't see how I define which groups the procedure compares. In the workflow, which age bins is it finding have different Lachnospiraceae abundance as there are 3 bins (0-100, 100-200, 200-400)? Is there any sort of directionality you can get from the output (increasing/decreasing abundance)?

Let me know if I'm being unclear or you need more info.

Thanks for any help, Ben

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvXVzsTI5kHCr4JcX2zvwLJCWY9R0ks5sBXp9gaJpZM4Nxk1V .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

spholmes avatar Jun 07 '17 15:06 spholmes

Hi Susan-

Thanks for your response.

Yes I am after the structSSI portion of the analysis. I have read through that paper, and I think maybe I follow now. Is it best to define your group comparison (or rounds of group comparisons) using the treePValues command's "environments"?

i.e. First round (comparing everything):

data("chlamydiae", package = "structSSI")
environments <- sample_data(chlamydiae)$SampleType
abundances <- otu_table(chlamydiae)
chl.tree <- get.edgelist(as.igraph(phy_tree(chlamydiae)))
chl.pval <- treePValues(chl.tree, abundances, environments)
chl.hfdr <- hFDR.adjust(chl.pval, chl.tree, alpha = 0.1)

Then say I want to compare only "Soil" vs "Skin" samples, I would just define a new environment, i.e.:

enviro2 <- as.factor(c("Soil", "Skin"))
chl.pval2 <- treePValues(chl.tree, abundances, enviro2)
chl.hfdr2 <- hFDR.adjust(chl.pval2, chl.tree, alpha = 0.1)

This would then be analogous to the contrasts in DESeq? It seems like it works, and then I wouldn't need to do any sort of phyloseq modification, i.e. subsetting the phyloseq to just my Soil and Skin samples.

In regards to the boxplots/ways to get fold change direction, if I understand correctly, I then could use basically what I had been before with DESeq to get the fold change information, but replace the p/adjusted p-values it calculates with the hierarchical multiple testing adjusted ones? Are there alternative helper functions I could use for this?

Does this seem to be correct in the approach?

Thanks, Ben

benkraj avatar Jun 07 '17 20:06 benkraj

Ben Your approach is the best one, sorry not many helpers as everyone tailors their use of the tools to a different workflow so not much help having a completely general tool, but seems like your approach is good. If you are testing many different subsets for the same data, you may have to cost yourself a little for multiple testing too. Susan

On Wed, Jun 7, 2017 at 1:24 PM, benkraj [email protected] wrote:

Hi Susan-

Thanks for your response.

Yes I am after the structSSI portion of the analysis. I have read through that paper, and I think maybe I follow now. Is it best to define your group comparison (or rounds of group comparisons) using the treePValues command's "environments"?

i.e. First round (comparing everything):

data("chlamydiae", package = "structSSI") environments <- sample_data(chlamydiae)$SampleType abundances <- otu_table(chlamydiae) chl.tree <- get.edgelist(as.igraph(phy_tree(chlamydiae))) chl.pval <- treePValues(chl.tree, abundances, environments) chl.hfdr <- hFDR.adjust(chl.pval, chl.tree, alpha = 0.1)

Then say I want to compare only "Soil" vs "Skin" samples, I would just define a new environment, i.e.:

enviro2 <- as.factor(c("Soil", "Skin")) chl.pval2 <- treePValues(chl.tree, abundances, enviro2) chl.hfdr2 <- hFDR.adjust(chl.pval2, chl.tree, alpha = 0.1)

This would then be analogous to the contrasts in DESeq? It seems like it works, and then I wouldn't need to do any sort of phyloseq modification, i.e. subsetting the phyloseq to just my Soil and Skin samples.

In regards to the boxplots/ways to get fold change direction, if I understand correctly, I then could use basically what I had been before with DESeq to get the fold change information, but replace the p/adjusted p-values it calculates with the hierarchical multiple testing adjusted ones? Are there alternative helper functions I could use for this?

Does this seem to be correct in the approach?

Thanks, Ben

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/18#issuecomment-306913859, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvYl4TsK4JqiK4RKpNuXAl4idPvpYks5sBwbhgaJpZM4Nxk1V .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

spholmes avatar Jun 07 '17 20:06 spholmes