dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

mergeSequenceTables()

Open ntromas opened this issue 2 years ago • 6 comments

Hi Ben,

I have to analyze >20 runs and for each run I used (separately) dada2. I would like to combine them using mergeSequenceTables(). Many times, the different persons that did the library, used in 2 or 3 different sequencing runs the same sample as it did not work properly. Instead of using repeats="sum", I wonder if there would be a simple way to select the sample (from those that are duplicated) with highest number of reads. Not sure if that makes sense.

Thanks for your help!

Nico

ntromas avatar Oct 12 '21 15:10 ntromas

We haven't implemented that logic, but this is a reasonable enhancement request to add a repeats="deepest" mode that performs as you describe.

For now, you can do this in R, but will need to write a bit of custom code for this purpose. You can pull out the duplicated samples names pretty easily:

all.sams <- as.vector(sapply(list(st1, st2, st3), rownames))
dupes <- unique(all.sams[duplicated(all.sams)])

Then you can loop over the sequence tables for each duplicated sample name and... maybe delete the samples (rows) from each table that isn't the highest depth. Then stick those sequence tables into mergeSequenceTables

benjjneb avatar Oct 12 '21 19:10 benjjneb

Thanks! Will try that. I was thinking to create a new table with all duplicates, the total counts numbers for each of them and from which table they were extracted. Then removing those with lowest count number.

ntromas avatar Oct 12 '21 21:10 ntromas

Hi Ben,

We found a way to remove duplicates but I just found a potential batch effect and I was planning to add an extra step to remove contaminant using our negative controls (blank). As I have multiple runs and then multiple blanks, what would be the best approach to remove them? I am also planning to use a PCR correction, what do you think?

Thanks for the help!

Nico

ntromas avatar Feb 14 '22 18:02 ntromas

Hi Nico, On the topic of contaminants, you may want to take a look at our paper and decontam software package for dealing with contamination: https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0605-2

That lays out our thinking on dealing with contaminants. If you have just one negative control per "batch", your options on what to do are a bit more limited than what we consider in the decontam paper. Probably you want to identify clear contaminant features (e.g. ASVs) from the single negative control in each batch, and remove those identified that way from all batches.

benjjneb avatar Feb 15 '22 16:02 benjjneb

Hi Ben,

Thanks for this information! I wonder what would happen if - let-s say I have 3 negative controls - but one of them seems to have been cross-contaminated by another sample.

Cheers,

Nico

Le mar. 15 févr. 2022 à 11:26, Benjamin Callahan @.***> a écrit :

Hi Nico, On the topic of contaminants, you may want to take a look at our paper and decontam software package https://github.com/benjjneb/decontam for dealing with contamination: https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0605-2

That lays out our thinking on dealing with contaminants. If you have just one negative control per "batch", your options on what to do are a bit more limited than what we consider in the decontam paper. Probably you want to identify clear contaminant features (e.g. ASVs) from the single negative control in each batch, and remove those identified that way from all batches.

— Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/1424#issuecomment-1040484442, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5D6G6WR6QE66WVSVFH2DU3J5BNANCNFSM5F24S2PA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

--


Nicolas Tromas PhD LS2N/Université de Montréal E-mail: @.*** @.***> Researchgate: NTromasPage https://www.researchgate.net/profile/Nicolas_Tromas Web: http://www.shapirolab.ca/


ntromas avatar Feb 16 '22 16:02 ntromas

Then you can't just remove all the taxa you observe in the cross-contaminated negative control.

benjjneb avatar Feb 17 '22 12:02 benjjneb