Using flagger for de novo primary assemblies of diploid species
Hello!
We are working on a number of rodent assemblies from animals where we do not have trio or Hi-C-seq information. We have Hifi + (usually) ONT UL reads for these species, and while we can get some phasing info from the Hifi reads alone, the primary assemblies typically look more complete than the Hap1/Hap2 assemblies with Hifiasm or Verkko. We tried Flagger without providing phasing information (fastq --> bam (with minimap2) --> flagger quick start) and it did a pretty good job catching obvious errors and collapses (see example of collapse below).
With this in mind, can you please let me know what we are losing when running flagger if we do not have phasing information, or can you see it leading to major false positives?
Quick edit: I saw a comment where you said "That is actually the assumption of HMM-Flagger that the input (bam file) contains phased read mappings”. Does this refer to a haplotagged bam file?
Best, Dustin
@DustinSokolowski By phased read mappings I meant having each read mapped to the correct haplotype assuming that the assembly is phased and diploid. There is no need for haplotagging bam file.
Thanks! Is flagger usable if the assembly cannot be fully phased?
HMM-Flagger assumes that each haplotype is assembled in a separate contig. It does not care about phasing switch errors. As long as both copies are assembled it works fine. However if for any part of the genome the haplotypes are collapsed into one contig they will be flagged.
Thank you!
What if the entire genome is collapsed (i.e., it is a primary assembly)? Would the whole assembly be flagged, or would the training step recalibrate to the total coverage and while we could not have the "duplicated" tag, we would still have "collapsed" and "error"?
Thanks, Dustin
Same question here, can we apply flagger on a pair of primary assembly (such as the one from Hifiasm prefix.hic.p_ctg.gfa) and an alternative assembly?