vsearch
vsearch copied to clipboard
fastq_pctid option for fastq_mergepairs
I'm trying to replicate read pair assembly that I've tested with usearch
and want to replicate with vsearch
. In the usearch
documentation, it suggests raising fastq_pctid
and fastq_maxdiffs
from the default values of 90% and 5, respectively, as the amount of overlap between the reads increases. vsearch
has fastq_maxdiffs
, which defaults to 10 and fastq_maxdiffpct
defaults to 100%. I think fastq_maxdiffpct is one minus usearch
's fastq_pctid
value. Is that right?
Also, from your documentation it sounds like you're saying that fastq_maxdiffpct
is superfluous:
There are other more sophisticated rules in the merging algorithm that will discard read pairs with a high fraction of mismatches
What are you all counting as those "more sophisticated rules"?
Yes, the vsearch fastq_maxdiffpct
option argument should be equivalent to 1 minus the argument to the usearch fastq_pctid
option. The fastq_pctid
option must be a rather new option introduced in a recent usearch version. Earlier usearch versions had the fastq_maxdiffpct
option.
Yes, I think the fastq_maxdiffpct
is superfluous, except for special cases. I think simply looking at the percentage of matching bases in the overlap is a bit too simplistic. The method in vsearch includes computing alignment scores for the overlapping region, checking for sudden drops in that score, taking the quality of the bases into account in the scoring, and also checking for multiple alternative alignments. I will provide a description of the algorithm used in vsearch in more detail soon.