dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

Extract all merged sequences (not unique) after mergers

Open ArmOli opened this issue 2 years ago • 3 comments

Dear developers,

I am working with Illumina Miseq 300X2 data set of antibodies repertoires. the amplicons are > then 600 bp, in the mergePairs function I added justConcatenate = TRUE because the overlap is not possibile. The function returned a data frame with the sequences in the first column and the aboundance in the second coloumn. I assumed that the sequences reported in the first column are the unique pairs that results from the merging step.

for example

sequence | abundance | forward | reverse CGTATCGCCTCCCTCGC... | 99 | 11 | 6

I would like to extract all the 17174 of merged sequences that are reported in this summary

input filtered denoisedF denoisedR merged nonchim 68750 38824 35213 17622 17174 16072

I already extracted the mergers$sequence in a fasta file using the function write.fasta(as.list(mergers$sequence), sequencename, fastaname, open = "w", nbchar = 60, as.string = FALSE) present in the packages Seqinr. But in total they are 1301 unique seqeunces and i suspect that somenthing did wrong during the merging.

Is it possible to extract all the 17174 not unique sequences that have been merged ? I didnt find such a list of sequences anywhere.

please let me know and thanks in advance for the support and for the amazing pipeline.

ArmOli avatar Oct 21 '21 17:10 ArmOli

Sure, you can just copy each unique sequence by the number of times it was present in the data (its "abundance").

seq.rereplicated <- rep(mergers$sequence, times=mergers$abundance)

Then proceed using that vector of sequences.

benjjneb avatar Oct 22 '21 22:10 benjjneb

justConcatenate = TRUE changes mergers[[1]]$sequence to a list instead of a vector. This leads to inconsistent downstream processing. One must run something like:

if(class(mergers[[1]]$sequence) == 'list'){
  mergers[[1]]$sequence = unlist(mergers[[1]]$sequence)
}

nick-youngblut avatar Jun 12 '22 11:06 nick-youngblut

Yep, never noticed that before. The offending code is an extraneous SIMPLIFY=FALSE argument here: https://github.com/benjjneb/dada2/blob/2e8360d08912b429533d1495198a4ac02e00ba32/R/paired.R#L143

benjjneb avatar Jun 14 '22 14:06 benjjneb