hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

High hamming/switch error when evaluating hifiasm+Hi-C assembly using yak

Open Yichen-HNU opened this issue 8 months ago • 2 comments

Dear chhylp 123,

I assembled HG002 using the hifiasm+Hi-C module with the following command:

/home/work/yichen/hifiasm/hifiasm -o HG002.asm -t 80 $read1 $read2 $read3 $read4 $read5 $read6 --h1 $hic1 --h2 $hic2

I used ~40X HiFi reads. The Hi-C data consists of ~120 GB of fastq.gz files.

However, when I evaluated the results using yak on both HG002.asm.hic.hap1.p_ctg.gfa.fa and HG002.asm.hic.hap2.p_ctg.gfa.fa, I observed the following results:

For hap1: W 355602 3798594 0.093614 H 390793 3798921 0.102869 N 2283539 1515386 0.398899

For hap2: W 394566 3092510 0.127588 H 442218 3092796 0.142983 N 1668913 1423885 0.460387

The commands I used for yak were:

yak count -b37 -t 32 -o maternal.yak HG003_HiSeq30x_subsampled_R1_QC.fq HG003_HiSeq30x_subsampled_R2_QC.fq yak count -b37 -t 32 -o paternal.yak HG004_HiSeq30x_subsampled_R1_QC.fq HG004_HiSeq30x_subsampled_R2_QC.fq

yak trioeval -e -t 32 maternal.yak paternal.yak ../HG002.asm.hic.hap2.p_ctg.gfa.fa > hap2.txt yak trioeval -e -t 32 maternal.yak paternal.yak ../HG002.asm.hic.hap1.p_ctg.gfa.fa > hap1.txt

The short reads (HG003 and HG004) are from the datasets provided in the hifiasm paper.

I am wondering if there might be an issue in my workflow that could explain such high hamming error and switch error rates.

Thank you very much for your time and help.

Best regards,
yichen

Yichen-HNU avatar Aug 10 '25 09:08 Yichen-HNU

I guess the input data should wrong. Evening hifiasm made mistakes, the switch error rate should not be such high.

chhylp123 avatar Aug 11 '25 16:08 chhylp123

I guess the input data should wrong. Evening hifiasm made mistakes, the switch error rate should not be such high.

I have checked the downloaded data source, and there should be no issue with data corruption or download errors. In addition, I performed a basic quality control using fastp, keeping only reads with Q20 or higher.

My guess is that the issue might be related to adapter trimming — since the short-read data contains adapters, and I may have overlooked this step.

I plan to run FastQC for further quality assessment to investigate this in more detail. Thank you very much for your reply and your help.

Best regards, Yichen

Yichen-HNU avatar Aug 12 '25 09:08 Yichen-HNU