verkko
verkko copied to clipboard
Identify homopolymers-count variants
Identify regions where the haplotypes only differ by homopolymers count and so are invisible in compressed space. These are likely to cause consensus issues when the two haplotypes are mixed into a smashed consensus.
Some examples of poor consensus in T2T HG002 XY from string graph pipeline so check the quality in Verkko and update consensus if needed.
A great set of curated examples is now available here: https://github.com/marbl/HG002-issues/issues?q=is%3Aopen+is%3Aissue+label%3Apriority
There are a good number of missed variants creating a falsely-homozygous region in simple-sequence repeats. We should consider unzipping these kinds of nodes locally w/uncompressed reads or w/o simple sequence compression to try to phase the reads. The signal I expect is there.