dada2_to_picrust comparison to published picrust pipeline

Thanks for developing this use of PICRUSt! Your idea of dynamically retraining based on dada2 reads is a great fit for any program that does not use 'close-ref' OTUs. I think this method could be wildly applicable to other algorithms including deblur and swarm.

I'm would love to get feedback from @mlangill and @zaneveld on how this method compares to the 'basic' usage of closed-ref greengenes OTUs.

Thank you for posting this pipeline!

Feb 28 '17 01:02 colinbrislawn

Thanks so much for the support, Colin. This pipeline is an initial attempt at "de novo" PICRUSt analysis that may be applied to/optimized for any sequenced read clustering algorithm. We are validating/tweaking the pipeline with paired 16S and shotgun metagenomic data at the moment, so keep an eye out for updates!

Mar 01 '17 03:03 vmaffei

Thank you for telling me more. Looking forward to new updates!

Mar 01 '17 16:03 colinbrislawn

Hey @colinbrislawn , just posted results from a quick validation study comparing the experimental DADA2 -> PICRUSt via ASR pipeline to the original PICRUSt pipeline. Please, let me know if you have any thoughts / suggestions!

Mar 20 '17 16:03 vmaffei

🔥 💯

Looks like it works well!

This might be a dumb question, but can you tell me what ASR means? It's used in this repo, but not in the dada2 documentation. Edit: Ancestral State Reconstruction (ASR)

I had a little trouble differentiating the three methods compared, until I made this table: Once I figured this out, the comparison became more clear.

name	OTUs method	ASR database
VSEARCH...pick_closed	closed-ref OTUs	original greengenes
DADA2...khmer	'dadas', with gg labels	original greengenes
DADA2...ASR	'dadas'	built from 'dadas'

For my money, this recalculation step is the coolest bit, because it could work with any denovo clustering method!

Mar 20 '17 17:03 colinbrislawn

PS. Thinking of de novo clustering methods...

name	OTUs method	ASR database
VSEARCH...denovo	de novo OTUs	built from OTUs
deblur...denovo	'deblured' reads	built from reads
unoise...denovo	zOTUs	built from zOTUs

(deblur is the new error correction method from the Knight lab.) (UNOISE and zOTUs are from Robert Edgar.)

I think this would solidify your method as widely applicable to de novo methods!

Mar 20 '17 18:03 colinbrislawn

Sure thing...ASR is ancestral state reconstruction. In brief, it's a technique used in PICRUSt (and other related tools) to predict gene copy number in a yet-to-be sequenced organism based on the copy number observed in fully sequenced organisms and the taxonomic distance from the other sequenced organisms, so to speak. I'm using "ASR" loosely to refer to "genome prediction" or "recalculated database."

You're definitely right about the applicability of this method to other de novo clustering algorithms. We will certainly look into deblur and de novo vsearch!

Mar 20 '17 21:03 vmaffei

Thanks. I've updated my tables accordingly, and added another modern method.

Thank you for your feedback, and building this great software.

Feel free to close this issue when you feel it's appropriate.

Mar 20 '17 22:03 colinbrislawn

Looks great. Thanks again, Colin!

Mar 20 '17 23:03 vmaffei

Robert Edgar, of MUSCLE and USEARCH fame, just released a new method of metagenome prediction, in direction competition with PICRUSt. http://biorxiv.org/content/early/2017/04/04/124156

May be worth considering.

Apr 12 '17 17:04 colinbrislawn

Hey Colin, thanks for posting this! I took a quick look at Robert Edgar's new method, which is very interesting. In short (correct me if I'm wrong), he created a new 16S reference database where each entry contains experimentally verified traits. His algorithm takes in short reads and finds a best hit (via kmer matches) within the reference sequences. Trait data from the best hit reference is then attributed to the short read. Ultimately, this method performs very well in the validation data presented.

The DADA2...kmer method is very similar to this in that short reads are "assigned" to reference sequences by best hit and any trait data attached to the reference is picked up by the short read. This makes the assumption that the short read is equivalent to the reference so long as an identity criterion/rule is met. Edgar's SINAPS makes the same assumption. In the case of closed-reference picrust, that criterion is best identity (with a lower limit of 97%). In SINAPS, that criterion is max kmer bootstrap match % (with no lower limit). In DADA2...kmer, that criterion is the same as SINAPS (with a lower limit of 80% bootstrap confidence).

A major difference between DADA2...ASR and Edgar's method is the use of ASR to create ancestral states, which essentially assigns trait data from a consensus of related references rather than a single best hit reference. Whether this performs better than SINAPS or not needs to be tested, for sure, but my guess is that ASR would perform slightly better especially when dealing with samples not covered too well by the reference database. When samples are covered well, results from both methods will be highly comparable.

Edgar's method is complementary to ASR methods and likely runs much faster than ASR methods (especially those implemented in R), which are more computationally intensive (as mentioned in the SINAPS paper).

Apr 12 '17 18:04 vmaffei

Hello, could you please tell me that in benchmarking figures, "VSEARCH...pick_closed" method mean that DADA2 ASVs were closed ref picked against GG databases? (or raw sequences were closed ref picked against GG databases?)

Jun 26 '18 19:06 dawn-cold

Hi dawn-cold, "VSEARCH...pick_closed" method refers to raw sequences -> closed ref against GG -> PICRUSt (the original PICRUSt pipeline workflow).

Jun 26 '18 21:06 vmaffei

Thank you very much for the prompt answer! I understand. Are there any benchmarking available for DADA2 ASVs -> closed ref against GG -> PICRUSt pipeline somewhere, like discussed in https://github.com/benjjneb/dada2/issues/48?

Jun 27 '18 04:06 dawn-cold

Unfortunately we haven't benchmarked that particular pipeline although it is potentially a very useful one! I am unaware of any other similar benchmarking trials that might lend more insight. If we plan to test this in the future, I will be sure to notify you (through this issue or elsewhere)! As an aside, I can't imagine why running DADA2 in advance of closed ref GG would perform poorly.

Jul 05 '18 16:07 vmaffei

dada2_to_picrust dada2_to_picrust copied to clipboard

comparison to published picrust pipeline

dada2_to_picrust
dada2_to_picrust copied to clipboard