dada2_to_picrust
dada2_to_picrust copied to clipboard
comparison to published picrust pipeline
Thanks for developing this use of PICRUSt! Your idea of dynamically retraining based on dada2 reads is a great fit for any program that does not use 'close-ref' OTUs. I think this method could be wildly applicable to other algorithms including deblur and swarm.
I'm would love to get feedback from @mlangill and @zaneveld on how this method compares to the 'basic' usage of closed-ref greengenes OTUs.
Thank you for posting this pipeline!
Thanks so much for the support, Colin. This pipeline is an initial attempt at "de novo" PICRUSt analysis that may be applied to/optimized for any sequenced read clustering algorithm. We are validating/tweaking the pipeline with paired 16S and shotgun metagenomic data at the moment, so keep an eye out for updates!
Thank you for telling me more. Looking forward to new updates!
Hey @colinbrislawn , just posted results from a quick validation study comparing the experimental DADA2 -> PICRUSt via ASR pipeline to the original PICRUSt pipeline. Please, let me know if you have any thoughts / suggestions!
🔥 💯
Looks like it works well!
This might be a dumb question, but can you tell me what ASR means? It's used in this repo, but not in the dada2 documentation. Edit: Ancestral State Reconstruction (ASR)
I had a little trouble differentiating the three methods compared, until I made this table: Once I figured this out, the comparison became more clear.
name | OTUs method | ASR database |
---|---|---|
VSEARCH...pick_closed | closed-ref OTUs | original greengenes |
DADA2...khmer | 'dadas', with gg labels | original greengenes |
DADA2...ASR | 'dadas' | built from 'dadas' |
For my money, this recalculation step is the coolest bit, because it could work with any denovo clustering method!
PS. Thinking of de novo clustering methods...
name | OTUs method | ASR database |
---|---|---|
VSEARCH...denovo | de novo OTUs | built from OTUs |
deblur...denovo | 'deblured' reads | built from reads |
unoise...denovo | zOTUs | built from zOTUs |
(deblur is the new error correction method from the Knight lab.) (UNOISE and zOTUs are from Robert Edgar.)
I think this would solidify your method as widely applicable to de novo methods!
Sure thing...ASR is ancestral state reconstruction. In brief, it's a technique used in PICRUSt (and other related tools) to predict gene copy number in a yet-to-be sequenced organism based on the copy number observed in fully sequenced organisms and the taxonomic distance from the other sequenced organisms, so to speak. I'm using "ASR" loosely to refer to "genome prediction" or "recalculated database."
You're definitely right about the applicability of this method to other de novo clustering algorithms. We will certainly look into deblur and de novo vsearch!
Thanks. I've updated my tables accordingly, and added another modern method.
Thank you for your feedback, and building this great software.
Feel free to close this issue when you feel it's appropriate.
Looks great. Thanks again, Colin!
Robert Edgar, of MUSCLE and USEARCH fame, just released a new method of metagenome prediction, in direction competition with PICRUSt. http://biorxiv.org/content/early/2017/04/04/124156
May be worth considering.
Hey Colin, thanks for posting this! I took a quick look at Robert Edgar's new method, which is very interesting. In short (correct me if I'm wrong), he created a new 16S reference database where each entry contains experimentally verified traits. His algorithm takes in short reads and finds a best hit (via kmer matches) within the reference sequences. Trait data from the best hit reference is then attributed to the short read. Ultimately, this method performs very well in the validation data presented.
The DADA2...kmer method is very similar to this in that short reads are "assigned" to reference sequences by best hit and any trait data attached to the reference is picked up by the short read. This makes the assumption that the short read is equivalent to the reference so long as an identity criterion/rule is met. Edgar's SINAPS makes the same assumption. In the case of closed-reference picrust, that criterion is best identity (with a lower limit of 97%). In SINAPS, that criterion is max kmer bootstrap match % (with no lower limit). In DADA2...kmer, that criterion is the same as SINAPS (with a lower limit of 80% bootstrap confidence).
A major difference between DADA2...ASR and Edgar's method is the use of ASR to create ancestral states, which essentially assigns trait data from a consensus of related references rather than a single best hit reference. Whether this performs better than SINAPS or not needs to be tested, for sure, but my guess is that ASR would perform slightly better especially when dealing with samples not covered too well by the reference database. When samples are covered well, results from both methods will be highly comparable.
Edgar's method is complementary to ASR methods and likely runs much faster than ASR methods (especially those implemented in R), which are more computationally intensive (as mentioned in the SINAPS paper).
Hello, could you please tell me that in benchmarking figures, "VSEARCH...pick_closed" method mean that DADA2 ASVs were closed ref picked against GG databases? (or raw sequences were closed ref picked against GG databases?)
Hi dawn-cold, "VSEARCH...pick_closed" method refers to raw sequences -> closed ref against GG -> PICRUSt (the original PICRUSt pipeline workflow).
Thank you very much for the prompt answer! I understand. Are there any benchmarking available for DADA2 ASVs -> closed ref against GG -> PICRUSt pipeline somewhere, like discussed in https://github.com/benjjneb/dada2/issues/48?
Unfortunately we haven't benchmarked that particular pipeline although it is potentially a very useful one! I am unaware of any other similar benchmarking trials that might lend more insight. If we plan to test this in the future, I will be sure to notify you (through this issue or elsewhere)! As an aside, I can't imagine why running DADA2 in advance of closed ref GG would perform poorly.