dada2
dada2 copied to clipboard
Extract all merged sequences (not unique) after mergers
Dear developers,
I am working with Illumina Miseq 300X2 data set of antibodies repertoires.
the amplicons are > then 600 bp, in the mergePairs function I added justConcatenate = TRUE
because the overlap is not possibile.
The function returned a data frame with the sequences in the first column and the aboundance in the second coloumn. I assumed that the sequences reported in the first column are the unique pairs that results from the merging step.
for example
sequence | abundance | forward | reverse CGTATCGCCTCCCTCGC... | 99 | 11 | 6
I would like to extract all the 17174 of merged sequences that are reported in this summary
input filtered denoisedF denoisedR merged nonchim 68750 38824 35213 17622 17174 16072
I already extracted the mergers$sequence
in a fasta file using the function write.fasta(as.list(mergers$sequence), sequencename, fastaname, open = "w", nbchar = 60, as.string = FALSE)
present in the packages Seqinr. But in total they are 1301 unique seqeunces and i suspect that somenthing did wrong during the merging.
Is it possible to extract all the 17174 not unique sequences that have been merged ? I didnt find such a list of sequences anywhere.
please let me know and thanks in advance for the support and for the amazing pipeline.
Sure, you can just copy each unique sequence by the number of times it was present in the data (its "abundance").
seq.rereplicated <- rep(mergers$sequence, times=mergers$abundance)
Then proceed using that vector of sequences.
justConcatenate = TRUE
changes mergers[[1]]$sequence
to a list instead of a vector. This leads to inconsistent downstream processing. One must run something like:
if(class(mergers[[1]]$sequence) == 'list'){
mergers[[1]]$sequence = unlist(mergers[[1]]$sequence)
}
Yep, never noticed that before. The offending code is an extraneous SIMPLIFY=FALSE
argument here: https://github.com/benjjneb/dada2/blob/2e8360d08912b429533d1495198a4ac02e00ba32/R/paired.R#L143