flagger icon indicating copy to clipboard operation
flagger copied to clipboard

Using flagger for de novo primary assemblies of diploid species

Open DustinSokolowski opened this issue 11 months ago • 5 comments

Hello!

We are working on a number of rodent assemblies from animals where we do not have trio or Hi-C-seq information. We have Hifi + (usually) ONT UL reads for these species, and while we can get some phasing info from the Hifi reads alone, the primary assemblies typically look more complete than the Hap1/Hap2 assemblies with Hifiasm or Verkko. We tried Flagger without providing phasing information (fastq --> bam (with minimap2) --> flagger quick start) and it did a pretty good job catching obvious errors and collapses (see example of collapse below).

Image

With this in mind, can you please let me know what we are losing when running flagger if we do not have phasing information, or can you see it leading to major false positives?

Quick edit: I saw a comment where you said "That is actually the assumption of HMM-Flagger that the input (bam file) contains phased read mappings”. Does this refer to a haplotagged bam file?

Best, Dustin

DustinSokolowski avatar Feb 05 '25 03:02 DustinSokolowski

@DustinSokolowski By phased read mappings I meant having each read mapped to the correct haplotype assuming that the assembly is phased and diploid. There is no need for haplotagging bam file.

mobinasri avatar Feb 16 '25 19:02 mobinasri

Thanks! Is flagger usable if the assembly cannot be fully phased?

dsokolo avatar Feb 16 '25 19:02 dsokolo

HMM-Flagger assumes that each haplotype is assembled in a separate contig. It does not care about phasing switch errors. As long as both copies are assembled it works fine. However if for any part of the genome the haplotypes are collapsed into one contig they will be flagged.

mobinasri avatar Feb 16 '25 20:02 mobinasri

Thank you!

What if the entire genome is collapsed (i.e., it is a primary assembly)? Would the whole assembly be flagged, or would the training step recalibrate to the total coverage and while we could not have the "duplicated" tag, we would still have "collapsed" and "error"?

Thanks, Dustin

DustinSokolowski avatar Feb 16 '25 21:02 DustinSokolowski

Same question here, can we apply flagger on a pair of primary assembly (such as the one from Hifiasm prefix.hic.p_ctg.gfa) and an alternative assembly?

JHCCoder avatar May 27 '25 01:05 JHCCoder