biokepi icon indicating copy to clipboard operation
biokepi copied to clipboard

Variant calling parallelization only uses major contigs

Open arahuja opened this issue 9 years ago • 3 comments

Current variant calling/Mutect parallelization only uses major contigs right now. But, with B38, this would drop ALT contigs

Not dropping them makes sense, but in general variant calling likely needs to be rethought for ALT contigs. What happens when a somatic variant is mapped to an ALT in the tumor sample, but the major contig in the normal sample?

arahuja avatar Mar 07 '16 19:03 arahuja

What happens when a somatic variant is mapped to an ALT in the tumor sample, but the major contig in the normal sample?

A false positive somatic variant.

Maybe for now we should simply avoid calling variants in polymorphic regions?

On Mon, Mar 7, 2016 at 2:56 PM, Arun Ahuja [email protected] wrote:

Current variant calling/Mutect parallelization only uses major contigs right now. But, with B38, this would drop ALT contigs

Not dropping them makes sense, but in general variant calling likely needs to be rethought for ALT contigs. What happens when a somatic variant is mapped to an ALT in the tumor sample, but the major contig in the normal sample?

— Reply to this email directly or view it on GitHub https://github.com/hammerlab/biokepi/issues/160.

iskandr avatar Mar 07 '16 22:03 iskandr

Another thing to watch out for is effect on mapq

  1. Does BWA work with ALT contigs in the GRCh38 release? Yes, since 0.7.11, BWA-MEM officially supports mapping to GRCh38+ALT. BWA-backtrack and BWA-SW don't properly support ALT mapping as of now. Please see README-alt.md for details. Briefly, it is recommended to use bwakit, the binary release of BWA, for generating the reference genome and for mapping.
  2. Can I just run BWA-MEM against GRCh38+ALT without post-processing? If you are not interested in hits to ALT contigs, it is okay to run BWA-MEM without post-processing. The alignments produced this way are very close to alignments against GRCh38 without ALT contigs. Nonetheless, applying post-processing helps to reduce false mappings caused by reads from the diverged part of ALT contigs and also enables HLA typing. It is recommended to run the post-processing script.

This page shows some examples: https://github.com/lh3/bwa/blob/master/README-alt.md

If we align sequence reads to GRCh38+ALT blindly, we will get many additional reads with zero mapping quality and miss variants on them.

arahuja avatar Mar 08 '16 16:03 arahuja

Similarly, it seems STAR suggests dropping the ALT contigs: https://github.com/alexdobin/STAR/issues/39#issuecomment-101214342

arahuja avatar Mar 08 '16 16:03 arahuja