hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Clarification of Hi-C Haplotypes

Open a-lud opened this issue 3 years ago • 1 comments

Hi,

I just wanted to clarify my understanding of the haplotypes produced in Hi-C mode.

Based on the paper and the docs, my interpretation of the hap1/hap2 output files when both HiFi and Hi-C data are used in the assembly process is:

  • The contigs should be haplotigs, and typically hifiasm will correctly output all the haplotigs that form a chromosome in the same haplotype file
  • The Hi-C data can phase within chromosomes (e.g. the haplotig example above), but it can't cluster between chromosomes. To do this would require trio data.
  • Therefore the haplotype files should typically consist of phased contig (i.e haplotigs) sequences that will constitute a chromosome, but the combination of chromosomes within a haplotype file are likely to be a mix of maternal and paternal origin.
    • e.g. hap1.p_ctg might contain maternal chromosome 1 haplotigs, but paternal chromosome 2 haplotigs etc...

Is that roughly correct?

Thanks for the help Al

a-lud avatar Feb 25 '22 03:02 a-lud

Yes, there is no inter-chromosome information in Hi-C, so that hifiasm cannot do that.

chhylp123 avatar Feb 25 '22 04:02 chhylp123