HapHiC icon indicating copy to clipboard operation
HapHiC copied to clipboard

Question for Work with hifiasm

Open lijl459 opened this issue 1 year ago • 5 comments

Hello,

I have a question regarding the use of haphic for genome assembly and processing results with hifiasm.

The genome I'm working with is diploid, and the haploid chromosome number is 11. In the command:

haphic pipeline allhaps.fa HiC.filtered.bam nchrs --gfa hap1.p_ctg.gfa,hap2.p_ctg.gfa

should I set nchrs to 11 or 22?

Additionally, what is the difference between this approach and directly using p_utg.fa? Will I obtain haplotype-phased assemblies if I run:

haphic pipeline p_utg.fa HiC.filtered.bam nchrs --gfa p_utg.gfa?

And in this case, should nchrs be set to 11 or 22?

Thank you!

lijl459 avatar Aug 09 '24 07:08 lijl459

In both cases, the nchrs parameter should be set to 22. Because there should be two sets of haplotypes in allhaps.fa or p_utg.fa.

Additionally, what is the difference between this approach and directly using p_utg.fa?

This question is complicated. You first need to understand the principle of how hifiasm uses Hi-C data to phase the assembly graph. Please refer to this paper of hifiasm. In short, the hap*.p_ctg sequences are fully phased and should have a higher contig N50 than that of p_utg. Although hifiasm may have some problems in dealing with autopolyploid genomes. However, in the case of a diploid genome, hifiasm is expected to work well. Hence, I would recommend using the hap*.p_ctg rather than the p_utg.

Please note that the scaffolding strategy could be different depending on the heterozygosity level. If the heterozygosity is high and there are still enough Hi-C reads after filtering with MAPQ>=1, you can use your first command to run HapHiC. If the heterozygosity is too low and there are not enough Hi-C reads after MAPQ>=1 filtering, it is suggested to align and scaffold the two sets of haplotypes separately.

zengxiaofei avatar Aug 09 '24 09:08 zengxiaofei

Thanks very much!

lijl459 avatar Aug 09 '24 09:08 lijl459

Hi, zengxiaofei

If I use this command: haphic pipeline allhaps.fa HiC.filtered.bam nchrs --gfa "hap1.p_ctg.gfa,hap2.p_ctg.gfa"

Is HiC.filtered.bam generated by allhap.fa(cat hap1.fa hap2.fa)?Like:

bwa mem -5SP -t 28 allhap.fa read1_fq.gz read2_fq.gz | samblaster | samtools view - -@ 14 -S -h -b -F 3340 -o HiC.bam HapHiC/utils/filter_bam HiC.bam 1 --nm 3 --threads 14 | samtools view - -b -@ 14 -o HiC.filtered.bam

Best wishes.

DayTimeMouse avatar Aug 14 '24 09:08 DayTimeMouse

@DayTimeMouse Yes.

zengxiaofei avatar Aug 14 '24 09:08 zengxiaofei

“Please note that the scaffolding strategy could be different depending on the heterozygosity level. If the heterozygosity is high and there are still enough Hi-C reads after filtering with MAPQ>=1, you can use your first command to run HapHiC. If the heterozygosity is too low and there are not enough Hi-C reads after MAPQ>=1 filtering, it is suggested to align and scaffold the two sets of haplotypes separately.” @zengxiaofei Hi, Zeng. For the filtered BAM file, around how much left (like how much percent?) would be considered as enough and indicated high heterozygosity?

awesomedeer avatar Aug 20 '24 04:08 awesomedeer