hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Add discussion - polyploid genomes

Open tallnuttrbgv opened this issue 1 year ago • 1 comments

Hi,

this is not really an issue - could I suggest the author adds a discussion page? There is much interest in assembling phased polyploids at the moment, especially in plants. I have several putatively polyploid plant hifi data sets. So far I have been using default parameters and only using the primary 'p.ctg.gfa' assembly. I do not have HiC data and want to try phasing the homoeolog contigs.

The docs suggest: "The *r_utg.gfa and *p_utg.gfa are lossless so that they also work for polyploid genomes. However, currently the contig-generation modules of hifiasm are designed for diploid samples, which means both the partially phased assembly and the fully-phased assembly does not directly support polyploid genomes. The docs currently say, "If it is set to >2, the quality of primary assembly for polyploid genomes might be improved. Please use primary assembly for polyploid samples and run multiple rounds of purging steps using third-party tools such as purge_dups."

But it is not clear how the latter suggestion is to operate? Is this on p_utg.gfa or the p_utg.gfa from the --primary assembly?

Thanks,

Theo

tallnuttrbgv avatar Feb 12 '24 23:02 tallnuttrbgv

Sorry for the late reply since I was too busy during the last a few weeks. Basically, you could have a try to set --n-hap to > 2, which will be helpful to keep haplotype information within the assembly graph. If the heterozygous rate of your sample is very high, one option is to run hifiasm with -l0, and then directly utilize p_ctg. However, if the heterozygous rate is not so high, it is quite challenging to get haplotype-resolved assembly. In this case, it would be better to get a primary assembly with purge_dups.

chhylp123 avatar Feb 15 '24 05:02 chhylp123