vsearch icon indicating copy to clipboard operation
vsearch copied to clipboard

fastq_pctid option for fastq_mergepairs

Open pschloss opened this issue 6 years ago • 1 comments

I'm trying to replicate read pair assembly that I've tested with usearch and want to replicate with vsearch. In the usearch documentation, it suggests raising fastq_pctid and fastq_maxdiffs from the default values of 90% and 5, respectively, as the amount of overlap between the reads increases. vsearch has fastq_maxdiffs, which defaults to 10 and fastq_maxdiffpct defaults to 100%. I think fastq_maxdiffpct is one minus usearch's fastq_pctid value. Is that right?

Also, from your documentation it sounds like you're saying that fastq_maxdiffpct is superfluous:

There are other more sophisticated rules in the merging algorithm that will discard read pairs with a high fraction of mismatches

What are you all counting as those "more sophisticated rules"?

pschloss avatar May 18 '18 20:05 pschloss

Yes, the vsearch fastq_maxdiffpct option argument should be equivalent to 1 minus the argument to the usearch fastq_pctid option. The fastq_pctid option must be a rather new option introduced in a recent usearch version. Earlier usearch versions had the fastq_maxdiffpct option.

Yes, I think the fastq_maxdiffpct is superfluous, except for special cases. I think simply looking at the percentage of matching bases in the overlap is a bit too simplistic. The method in vsearch includes computing alignment scores for the overlapping region, checking for sudden drops in that score, taking the quality of the bases into account in the scoring, and also checking for multiple alternative alignments. I will provide a description of the algorithm used in vsearch in more detail soon.

torognes avatar May 24 '18 12:05 torognes