hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

how to choose final genome output file of homozygous with hic

Open Lillian-21 opened this issue 2 years ago • 17 comments

I have a homozygous plant sepceis (estimatic size 1.1G). First, I run hifi read (33G) with hifiasm-0.16.1/hifiasm -o out.fasta -t 16 -l0 input.fasta. I get 1.8G p_ctg (N50 2.3M, not very good), 158M a_ctg. Then I used p_ctg to run Hi-C phasing: hifiasm-0.16.1/hifiasm -l0 -o p_ctg.fa -t 24 --h read1.fq.gz -h2 read2.fq.gz. I get 1.3G hic.p_ctg.fa; 1.3G hap1.p_ctg.fa; 1.2G hap2.p_ctg.fa. Now I do not know which one should I use to do next analysis? which one is the whole geneme? How to understand the hap1 and hap2? Besides, I get a abnomal kmer.

20211019102143 20211019102215 20211019102235

Lillian-21 avatar Oct 19 '21 02:10 Lillian-21

Does your sample just has one haplotype with 8x coverage?

chhylp123 avatar Oct 19 '21 02:10 chhylp123

Hi chhylp123,

How important is it and what are the difference phasing homozygous genome vs heterozygous genome?

B10inform avatar Mar 11 '22 19:03 B10inform

I guess there is no need to phase homozygous genomes, as they only have one haplotype?

chhylp123 avatar Mar 11 '22 19:03 chhylp123

Should the homozygous genomes be purged? Without purging the genome size would have lots of duplicates.

Heterozygous: Genome size 600Mb, phasing gives hap1 ~ 300Mb hap2 ~ 300Mb, both are used for downstream analysis Homozygous: Genome size 600Mb, (if phase gives hap1 ~ 300Mb hap2 ~ 300Mb) Use 600Mb for downstream analysis??

Thanks

B10inform avatar Mar 11 '22 20:03 B10inform

Just make sure: what are the differences between the homozygous genome and the heterozygous genome? If a genome is homozygous, hifiasm with -l0 should not produce assembly including lots of duplicates.

chhylp123 avatar Mar 11 '22 21:03 chhylp123

Property min max
Homozygous (aa) 99.09% 99.11%
Heterozygous (ab) 0.88% 0.90%

hifiasm -o sample.asm -l0 sample.fastq

.p_ctg.fasta + .a_ctg.fasta = .pa_ctg.fasta (600Mb) Genome size 600Mb,

if phase: hap1 ~ 300Mb hap2 ~ 300Mb.

What to use for downstream analysis?? 600Mb or ~300Mb?

Thanks

B10inform avatar Mar 11 '22 21:03 B10inform

Could you please check the assembly graph of the homozygous genome? If the graph has a lot of small bubbles, it is more likely to be a heterozygous genome with low heterozygosity. In this case, I would recommend you to use phased assemblies.

chhylp123 avatar Mar 11 '22 21:03 chhylp123

This is how the assembly graph looks like.

image

B10inform avatar Mar 11 '22 22:03 B10inform

I mean the p_utg.noseq.gfa, which could be visualized by Bandage. But at least from your k-mer plot, it is a heterozygous genome.

chhylp123 avatar Mar 11 '22 22:03 chhylp123

Well, what's the estimated genome size by k-mers, and the BUSCO scores? I guess it should be a heterozygous genome with high heterozygosity. But there is a slight possibility that this genome is homozygous.

chhylp123 avatar Mar 11 '22 22:03 chhylp123

This is p_ctg.noseq.gfa image

B10inform avatar Mar 11 '22 22:03 B10inform

The estimated genome size is 301Mb.

With K-mer GEnomeeScope image

B10inform avatar Mar 11 '22 22:03 B10inform

Then I guess it is a heterozygous genome.

chhylp123 avatar Mar 11 '22 22:03 chhylp123

So i should not use -l0 rather use -l3

B10inform avatar Mar 11 '22 22:03 B10inform

Yes, I guess phased assemblies should work. You could also compare the BUSCO scores for double checking.

chhylp123 avatar Mar 11 '22 22:03 chhylp123

Some confusion: Heterozygous species:

GenomeScope:

Property min
Homozygous (aa) 96.44%
Heterozygous (ab) 3.51%

image

Assembly graph image

Homozygous species:

GenomeScope graph and table shows Homozygous:

Property min
Homozygous (aa) 99.09%
Heterozygous (ab) 0.88%

image

Assembly graph image

The look totally different.How to reliably confirm the heterozygosity and homozygocity?

Thanks

B10inform avatar Mar 11 '22 23:03 B10inform

The reason is that GenomeScope and hifiasm utilize k-mers with different lengths. I guess most genomes are heterozygous unless some very special genomes. For these genomes, you should always know they are homozygous in advance.

chhylp123 avatar Mar 11 '22 23:03 chhylp123