hifiasm
hifiasm copied to clipboard
-s option and high heterozygosity
Hi,
My species is triploid and is highly hetrozygous. I used
hifiasm --primary --n-hap 3 -t 24 -o out.asm .*.fastq.gz
But the assembly size of my primary contings is way higher (240Mbp) than the genomescope estimation (140Mbp).
http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=nnC4CPmgLE3605rbyM7y
I saw on the doc that the -s
option could be adjusted. Do you have any recommendation of the value I can set ?
And/Or do you have recommendation about other options ?
Thanks !
Could you please also set --n-hap 3
? The default purging step has a diploid assumption.
Yes I did set --n-hap 3
. Look at my command :)
Sorry for that. In this case probably you should try purge_dups. I guess you should run multiple rounds of purge_dups for triploid samples. Hifiasm just does one round of purging so that it may not be able to get primary assembly properly.
Thanks I will take a look.
What about the option -s
? is it useless for triploid ?
It is the similarity threshold to find overlaps between different haplotypes. Usually it is ok with the default -s 0.55
. If the heterozygosity rate is too high, you can set smaller value for it.
So will setting --n-hap 3 produce a three haplotype assembly? I was just about to post a question about tetraploid assembly so I want to try --n-hap4 with hic.
Thanks, KF
Not able to work for polyploid samples right now. Set 3 or 4 for --n-hap
is just used to disable diploid assumption during graph clean.
OK, thanks. Polyploids are definitely the next challenge to overcome. I'll look forward to this capability in hifiasm as I have a lot of polyploids to do!
KF
Yeah, polyploids are interesting but we don't have polyploidy data for testing and debugging...
Polyploids would definitely be "the feature": What would you need? Would ccs data be enough?
Polyploids would definitely be "the feature": What would you need? Would ccs data be enough?
Thanks for the help! For us, it would be good to get HiFi, Hi-C, and one type of ground truth. We need ground truth to have a sense for polyploid samples.
Thanks for the information here. I'm working on an AAB-type triploid genome. I'd like to have the haplotypes phased. So currently, what would be the best practice using hifiasm for a triploid species? How about this:
- Use --n-haps 3, which would help graph cleaning, so high quality p_utgs.
- Use -l0, no purging at all, just to keep all unitigs.
- Use extracted unitigs and HiC data for scaffolding using external HiC programs.
- Take information in assemble graph and (reads) to fill the gaps as much as possible of the assembly from last step. What do you think of this and what else might be helpful? Thanks a lot, I would like to hear your opinion.
Best, Tao
btw, could you help to have a quick look at my running log and assembly graph (p_utg) to see if something is very wrong.. My genome size is around 2G (3 haplotypes in total). Thanks!
run_log.txt