metacoder icon indicating copy to clipboard operation
metacoder copied to clipboard

calc_diff_abund_deseq2

Open rogarcia1992 opened this issue 5 years ago • 12 comments

Hi,

First of all thanks for developing this amazing package! I would like to use DeSeq2 negative binomial analysis for differential abundance and then use compare_groups and lastly plot a metacoder heat tree matrix. I tried the calc_diff_abund_deseq2 but it did not work. Any suggestions?

rogarcia1992 avatar Aug 25 '20 20:08 rogarcia1992

Thanks!

Can you attach some code and input data that reproduces the problem so I can figure out what is going on?

zachary-foster avatar Aug 28 '20 00:08 zachary-foster

FT_A_OTUTAX_Table.csv https://drive.google.com/file/d/1BOGGqynRCW7nrMyWgw-4J8oCE8fu3_NP/view?usp=drive_web Hi Dr. Foster,

Thank you for your quick response! See code and input data attached below.

Best regards, Ray.

On Thu, Aug 27, 2020 at 8:08 PM Zachary Foster [email protected] wrote:

Thanks!

Can you attach some code and input data that reproduces the problem so I can figure out what is going on?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/metacoder/issues/291#issuecomment-682252167, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXLAHVOONO72ELPJYBLP4TSC3YP3ANCNFSM4QLCLHBQ .

rogarcia1992 avatar Aug 28 '20 17:08 rogarcia1992

Sorry for the delay! I forgot about this for a while. I downloaded the CSV file, but I don't see any code?

zachary-foster avatar Sep 10 '20 22:09 zachary-foster

Hi,

See code attached below. Thanks!

On Thu, Sep 10, 2020 at 6:54 PM Zachary Foster [email protected] wrote:

Sorry for the delay! I forgot about this for a while. I downloaded the CSV file, but I don't see any code?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/metacoder/issues/291#issuecomment-690773637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXLAHWEEP7IR3ZHURK4SHTSFFKKJANCNFSM4QLCLHBQ .

rogarcia1992 avatar Sep 10 '20 23:09 rogarcia1992

I am still not seeing the code. Did you reply by email? Maybe github does not include email attachments?

zachary-foster avatar Sep 11 '20 22:09 zachary-foster

Yes, I replied by e-mail. I will send it via github and see if it works. Sorry for the inconvenience.

On Fri, Sep 11, 2020 at 6:55 PM Zachary Foster [email protected] wrote:

I am still not seeing the code. Did you reply by email? Maybe github does not include email attachments?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/metacoder/issues/291#issuecomment-691345387, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXLAHV72VLVD3LIUGVQDY3SFKTHLANCNFSM4QLCLHBQ .

rogarcia1992 avatar Sep 11 '20 23:09 rogarcia1992

##Create Taxmap Object FT_A_Tax_Map = parse_tax_data(FT_A_OTUTAX_Table, class_cols = c("Root", "Domain", "Phylum", "Class", "Order", "Family", "Genus"), class_sep = " ", class_key = c(tax_name = "taxon_name")) Sample_Data_FT_A

##Remove undesired and low abundance taxa. Perform this if you use infiletered data FT_A_Tax_Map$data$tax_data <- zero_low_counts(FT_A_Tax_Map, data = "tax_data", min_count = 5) no_reads <- rowSums(FT_A_Tax_Map$data$tax_data[, Sample_Data_FT_A$Sample_Num]) == 0 sum(no_reads) FT_A_Tax_Map <- filter_obs(FT_A_Tax_Map, data = "tax_data", ! no_reads, drop_taxa = TRUE) FT_A_Tax_Map <- filter_ambiguous_taxa(FT_A_Tax_Map, subtaxa = TRUE)

###Get per-taxon counts FT_A_Tax_Map$data$tax_abund <- calc_taxon_abund(FT_A_Tax_Map, data = "tax_data", cols = Sample_Data_FT_A$Sample_Num)

###Calculate differences between groups and multiple test correction. FT_A_Tax_Map$data$diff_table <- calc_diff_abund_deseq2(FT_A_Tax_Map, data = "tax_abund", cols = Sample_Data_FT_A$Sample_Num, groups = Sample_Data_FT_A$Sample_ID) ##Error in calc_diff_abund_deseq2(FT_A_Tax_Map, data = "tax_abund", cols = Sample_Data_FT_A$Sample_Num, ##could not find function "calc_diff_abund_deseq2" ###print Taxmap object print(FT_A_Tax_Map)

1439 taxa: aab. Root, aac. NA, aad. Bacteria, aae. NA, aaf. Archaea ... dif. NA, dig. Gallicola, dim. stricto, div. NA 1439 edges: NA->aab, NA->aac, aab->aad, aac->aae, aab->aaf, aad->aag ... dcu->die, dgi->dif, dcu->dig, dhd->dim, dif->div 3 data sets: tax_data: # A tibble: 34,751 x 49 taxon_id S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 1 bbx 18 229 0 33 11 55 0 220 0 39 44 5 0 13 2985 4301 6035 6108 8001 2 bby 0 0 0 0 0 9 0 7 0 17 0 5 0 0 42 24 92 279 159 3 bbz 0 0 0 0 6 0 0 0 0 0 0 0 0 0 25 0 8 0 0 # … with 34,748 more rows, and 29 more variables: S20 , S21 , S22 , S23 , S24 , S25 , # S26 , S27 , S28 , S29 , … tax_abund: # A tibble: 1,439 x 49 taxon_id S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 1 aab 10007 8535 5 10403 17514 15353 10271 11319 13783 14818 19579 30450 21938 9284 19047 14692 33891 30480 25594 2 aac 44 6 0 11 110 21 29 41 11 33 0 23 5 0 76 7 66 8 72 3 aad 9951 8482 5 10368 17494 15321 10239 11282 13737 14808 19570 30385 21918 9278 19042 14676 33838 30419 25594 # … with 1,436 more rows, and 29 more variables: S20 , S21 , S22 , S23 , S24 , S25 , # S26 , S27 , S28 , S29 , … diff_table: # A tibble: 8,634 x 7 taxon_id treatment_1 treatment_2 log2_median_ratio median_diff mean_diff wilcox_p_value 1 aab HIDF DIDF -0.921 -11215 -19162. 0.0205 2 aac HIDF DIDF -0.447 -8 -13.6 0.750 3 aad HIDF DIDF -0.925 -11246. -19166. 0.0205 # … with 8,631 more rows 0 functions:

##Sample Data head(Sample_Data_FT_A)

A tibble: 6 x 20

Sample_Num Field_ID Sample_ID GPS_Location County Physical_Region Soil_Type Organic_Matter Bulk_Soil_pH CEC Na P K Ca Mg 1 S1 FT_A HIDF 35.60004101… Wilson Inner Coastal … Norfolk … 0.36 5.7 4.8 0.1 55 254 451 114 2 S2 FT_A HIDF 35.60004101… Wilson Inner Coastal … Norfolk … 0.36 5.8 4.3 0 35 128 435 104 3 S3 FT_A HIDF 35.60004101… Wilson Inner Coastal … Norfolk … 0.36 5.5 4.5 0.1 62 138 442 96 4 S4 FT_A HIDF 35.60004101… Wilson Inner Coastal … Norfolk … 0.32 5.9 4.7 0.1 39 200 467 122 5 S5 FT_A HIDF 35.60004101… Wilson Inner Coastal … Norfolk … 0.41 6 4.6 0.1 38 146 493 111 6 S6 FT_A HIDF 35.60004101… Wilson Inner Coastal … Norfolk … 0.41 5.1 4.9 0.1 43 157 437 106

… with 5 more variables: S , Mn , Cu , Zn , Rhizospheric_pH

rogarcia1992 avatar Sep 11 '20 23:09 rogarcia1992

Ok, I see it now. I dont have the data for Sample_Data_FT_A, but from the commentted error, I think the problem is that you dont have the version of metacoder with the deseq2 function. The function has not been tested enough yet to be in the main version, but it seems to work. You can install the development version with that function using:

devtools::install_github('grunwaldlab/metacoder@deseq2')

Try it again with that version and let me know if that fixes the problem

zachary-foster avatar Sep 12 '20 01:09 zachary-foster

Hi,

The function works but now I'm having another problem. See below.

"Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet) : all gene-wise dispersion estimates are within 2 orders of magnitude from the minimum value, and so the standard curve fitting techniques will not work. One can instead use the gene-wise estimates as final estimates: dds <- estimateDispersionsGeneEst(dds) dispersions(dds) <- mcols(dds)$dispGeneEst ...then continue with testing using nbinomWaldTest or nbinomLRT"

On Fri, Sep 11, 2020 at 9:29 PM Zachary Foster [email protected] wrote:

Ok, I see it now. I dont have the data for Sample_Data_FT_A, but from the commentted error, I think the problem is that you dont have the version of metacoder with the deseq2 function. The function has not been tested enough yet to be in the main version, but it seems to work. You can install the development version with that function using:

devtools::install_github('grunwaldlab/metacoder@deseq2')

Try it again with that version and let me know if that fixes the problem

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/metacoder/issues/291#issuecomment-691376113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXLAHUVSSQFSRRJAUV2WZLSFLFF5ANCNFSM4QLCLHBQ .

rogarcia1992 avatar Sep 12 '20 20:09 rogarcia1992

That error seems to be coming from the deseq2 code. I dont know enough about deseq2 to provide advice on how to fix it unfortunately. If you send me your sample metadata (Sample_Data_FT_A), I might be able to replicate the error and look into it, but most likely it is an issue with deseq2 and your dataset that requires some non-default use of deseq2.

If you can get deseq2 code to work outside of metacoder for your dataset and send me the solution it would help.

zachary-foster avatar Sep 14 '20 19:09 zachary-foster

That's is what I thought. I got the deseq2 code to work, but when plotting I can't use shape nor color for my sample_id. I think if you are able to include a step to calculate geometric means prior to estimate size facotrs, the function in metacoder will work. See code below:

FT_A_DeSeq2 = subset_samples(SAM_OTUTAX_3, Field_ID=="FT_A") FT_A_DeSeq2_500 <- prune_samples(sample_sums(FT_A_DeSeq2) > 500, FT_A_DeSeq2) ##optional and parameters will change based on your data. FT_A_DeSeq2 <- phyloseq_to_deseq2(FT_A_DeSeq2, ~ Sample_ID)

##calculate geometric means prior to estimate size factors gm_mean = function(x, na.rm=TRUE){ exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x)) }

geoMeans = apply(counts(FT_A_DeSeq2), 1, gm_mean) FT_A_DeSeq2 = estimateSizeFactors(FT_A_DeSeq2, geoMeans = geoMeans) FT_A_DeSeq2 = DESeq(FT_A_DeSeq2, fitType = "local")

##Create a table of the results of the tests. results_FT_A_DeSeq2 = results(FT_A_DeSeq2) results_FT_A_DeSeq2 = results_FT_A_DeSeq2[order(results_FT_A_DeSeq2$padj, na.last = NA), ] alpha = 0.05 sigtab = results_FT_A_DeSeq2[(results_FT_A_DeSeq2$padj < alpha), ] sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(SAM_OTUTAX_3)[rownames(sigtab), ], "matrix")) head(sigtab) print(sigtab)

###Let’s look at just the OTUs that were significantly enriched in the carcinoma tissue. First, cleaning up the table a little for legibility. posigtab = sigtab[sigtab[, "log2FoldChange"] > 0, ] posigtab = posigtab[, c("baseMean", "log2FoldChange", "lfcSE", "padj", "Phylum", "Class", "Family", "Genus")]

head(posigtab)

####Plot Results theme_set(theme_bw()) sigtabgen = subset(sigtab, !is.na(Genus))

#Phylum order x = tapply(sigtabgen$log2FoldChange, sigtabgen$Phylum, function(x) max(x)) x = sort(x, TRUE) sigtabgen$Phylum = factor(as.character(sigtabgen$Phylum), levels=names(x))

Genus order

x = tapply(sigtabgen$log2FoldChange, sigtabgen$Genus, function(x) max(x)) x = sort(x, TRUE) sigtabgen$Genus = factor(as.character(sigtabgen$Genus), levels=names(x))

##Plot ggplot(sigtabgen, aes(y=Genus, x=log2FoldChange, color=Phylum)) + geom_vline(xintercept = 0.0, color = "gray", size = 0.5) + geom_point(size=6) + theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))

rogarcia1992 avatar Sep 14 '20 20:09 rogarcia1992

See Sample_Data_FT_A attached below. Thanks!

On Mon, Sep 14, 2020 at 3:45 PM Zachary Foster [email protected] wrote:

That error seems to be coming from the deseq2 code. I dont know enough about deseq2 to provide advice on how to fix it unfortunately. If you send me your sample metadata (Sample_Data_FT_A), I might be able to replicate the error and look into it, but most likely it is an issue with deseq2 and your dataset that requires some non-default use of deseq2.

If you can get deseq2 code to work outside of metacoder for your dataset and send me the solution it would help.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/metacoder/issues/291#issuecomment-692273617, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXLAHVIJ2YWCCXX3ASLKWTSFZXF5ANCNFSM4QLCLHBQ .

rogarcia1992 avatar Sep 14 '20 20:09 rogarcia1992