hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

-s option and high heterozygosity

Open ptranvan opened this issue 2 years ago • 12 comments

Hi,

My species is triploid and is highly hetrozygous. I used

hifiasm --primary --n-hap 3 -t 24 -o out.asm .*.fastq.gz

But the assembly size of my primary contings is way higher (240Mbp) than the genomescope estimation (140Mbp).

http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=nnC4CPmgLE3605rbyM7y

I saw on the doc that the -s option could be adjusted. Do you have any recommendation of the value I can set ?

And/Or do you have recommendation about other options ?

Thanks !

ptranvan avatar Sep 04 '21 11:09 ptranvan

Could you please also set --n-hap 3? The default purging step has a diploid assumption.

chhylp123 avatar Sep 04 '21 14:09 chhylp123

Yes I did set --n-hap 3 . Look at my command :)

ptranvan avatar Sep 04 '21 14:09 ptranvan

Sorry for that. In this case probably you should try purge_dups. I guess you should run multiple rounds of purge_dups for triploid samples. Hifiasm just does one round of purging so that it may not be able to get primary assembly properly.

chhylp123 avatar Sep 04 '21 14:09 chhylp123

Thanks I will take a look. What about the option -s ? is it useless for triploid ?

ptranvan avatar Sep 04 '21 15:09 ptranvan

It is the similarity threshold to find overlaps between different haplotypes. Usually it is ok with the default -s 0.55. If the heterozygosity rate is too high, you can set smaller value for it.

chhylp123 avatar Sep 04 '21 15:09 chhylp123

So will setting --n-hap 3 produce a three haplotype assembly? I was just about to post a question about tetraploid assembly so I want to try --n-hap4 with hic.

Thanks, KF

kevfengler227 avatar Sep 08 '21 20:09 kevfengler227

Not able to work for polyploid samples right now. Set 3 or 4 for --n-hap is just used to disable diploid assumption during graph clean.

chhylp123 avatar Sep 08 '21 21:09 chhylp123

OK, thanks. Polyploids are definitely the next challenge to overcome. I'll look forward to this capability in hifiasm as I have a lot of polyploids to do!

KF

kevfengler227 avatar Sep 08 '21 22:09 kevfengler227

Yeah, polyploids are interesting but we don't have polyploidy data for testing and debugging...

chhylp123 avatar Sep 08 '21 22:09 chhylp123

Polyploids would definitely be "the feature": What would you need? Would ccs data be enough?

BjoernUsadel avatar Sep 09 '21 13:09 BjoernUsadel

Polyploids would definitely be "the feature": What would you need? Would ccs data be enough?

Thanks for the help! For us, it would be good to get HiFi, Hi-C, and one type of ground truth. We need ground truth to have a sense for polyploid samples.

chhylp123 avatar Sep 10 '21 21:09 chhylp123

Thanks for the information here. I'm working on an AAB-type triploid genome. I'd like to have the haplotypes phased. So currently, what would be the best practice using hifiasm for a triploid species? How about this:

  1. Use --n-haps 3, which would help graph cleaning, so high quality p_utgs.
  2. Use -l0, no purging at all, just to keep all unitigs.
  3. Use extracted unitigs and HiC data for scaffolding using external HiC programs.
  4. Take information in assemble graph and (reads) to fill the gaps as much as possible of the assembly from last step. What do you think of this and what else might be helpful? Thanks a lot, I would like to hear your opinion.

Best, Tao btw, could you help to have a quick look at my running log and assembly graph (p_utg) to see if something is very wrong.. My genome size is around 2G (3 haplotypes in total). Thanks! run_log.txt assemble_graph

zhaotao1987 avatar Dec 28 '21 15:12 zhaotao1987