verkko icon indicating copy to clipboard operation
verkko copied to clipboard

Identify homopolymers-count variants

Open skoren opened this issue 3 years ago • 2 comments

Identify regions where the haplotypes only differ by homopolymers count and so are invisible in compressed space. These are likely to cause consensus issues when the two haplotypes are mixed into a smashed consensus.

skoren avatar Nov 05 '21 20:11 skoren

Some examples of poor consensus in T2T HG002 XY from string graph pipeline so check the quality in Verkko and update consensus if needed.

skoren avatar Jan 13 '22 20:01 skoren

A great set of curated examples is now available here: https://github.com/marbl/HG002-issues/issues?q=is%3Aopen+is%3Aissue+label%3Apriority

There are a good number of missed variants creating a falsely-homozygous region in simple-sequence repeats. We should consider unzipping these kinds of nodes locally w/uncompressed reads or w/o simple sequence compression to try to phase the reads. The signal I expect is there.

skoren avatar Feb 17 '23 19:02 skoren