tracy icon indicating copy to clipboard operation
tracy copied to clipboard

Base quality and consensus generation

Open nriddiford opened this issue 2 years ago • 7 comments

First off - thanks for all the great work on tracy. It's quite amazing to me how few tools there are for performing trace file assembly - so thanks for filling this void with a very nice tool!

I have been using tracy quite a bit recently to assemble trace files, and perform variant calling relative to a reference sequence. Generally, this seems to work very well using tracy, but I have a question (related to a previous issue ) on the interplay between base-call confidence (on the chromatogram), and consensus formation.

I'm seeing incorrect consensus calls being made for a particular base where one of the trace files contains a low-confidence call and the other a high confidence call. From what I understand (based on your previous explanation) tracy does not use the base quality from the chromatogram, and I guess just choses on base over the other when there's a disagreement?

Here's what I'm seeing:

Screenshot 2022-04-06 at 16 19 32

This shows 2 trace files in Geneious. When I assemble these using tracy assemble --format fastq --inccons trace1.ab1 trace2.ab1 the resulting consensus contains insertions at both positions highlighted in red. This is strange to me - the base quality in trace 2 is clearly higher than in trace 1. Or is it the case that with insertions in one trace file, there is no base to compare to in the second trace file, so the insertion is included in the consensus, irrespective of quality?

Is this expected behaviour?

Thanks for any help!

nriddiford avatar Apr 06 '22 14:04 nriddiford

Just to add, I've also been wondering about this!

blex-max avatar Apr 07 '22 09:04 blex-max

@tobiasrausch forgive me for pinging you, but are you intending to respond to this?

blex-max avatar May 12 '22 08:05 blex-max

For tracy assemble it's a simple majority vote. If you have, for instance, 3 traces and 2 support a gap - and 1 a nucleotide then the gap - is chosen. Ties are arbitrarily broken and tracy assemble does not take into account qualities at the moment. For the pairwise case, tracy consensus does use the qualities but gaps don't have any to begin with. Therefore, for tracy consensus it depends on whether you use -i or not.

tobiasrausch avatar May 12 '22 10:05 tobiasrausch

@tobiasrausch Thanks for the clarification. How about in cases where you have 2 traces (like the image above). Is it just a 50:50 change to incorporate a low quality insertion?

nriddiford avatar May 13 '22 14:05 nriddiford

Indeed, it's a 50:50 chance in theory but in order to make the algorithm deterministic the code currently favours nucleotides over gaps.

tobiasrausch avatar May 16 '22 14:05 tobiasrausch

Is this likely to change? As @nriddiford says, it seems a shame to have a 50% chance to incorporate a low quality base when the information is available to make the better call.

blex-max avatar May 16 '22 15:05 blex-max

I think tracy consensus in tracy v0.7.5 now properly handles the low-quality vs. high-quality base problem but low-quality insertions vs. gaps is still something I need to work on. Do you have some example traces that you can share with me where you think the insertion is incorrect? Thanks.

tobiasrausch avatar Apr 12 '23 10:04 tobiasrausch