hifiasm
hifiasm copied to clipboard
How do I evaluate my genome?
Hello, I have successfully completed the assembly, but now I want to evaluate the quality of the genome assembly. The software I want to use at present is Busco. But how do I convert this GFA format file for further analysis?
NA12878.asm.ec.bin
NA12878.asm.hic.hap1.p_ctg.g
NA12878.asm.hic.hap1.p_ctg.lowQ.be
NA12878.asm.hic.hap1.p_ctg.noseq.gfa
NA12878.asm.hic.hap2.p_ctg.g
NA12878.asm.hic.hap2.p_ctg.lowQ.be
NA12878.asm.hic.hap2.p_ctg.noseq.gfa
NA12878.asm.hic.lk.bin
NA12878.asm.hic.r_utg.gf
NA12878.asm.hic.r_utg.lowQ.be
NA12878.asm.hic.r_utg.noseq.gfa
NA12878.asm.hic.tlb.bin
NA12878.asm.ovlp.reverse.bin
NA12878.asm.ovlp.source.bin
I used the following code, do not know if this is correct? awk '$1 ~/S/ {print ">"$2"\n"$3}' reads.gfa > reads.fasta
The line below is given in the README (see point "Getting started") of this repository to go from GFA to FASTA...
awk '/^S/{print ">"$2;print $3}' test.p_ctg.gfa > test.p_ctg.fa # get primary contigs in FASTA
Thank you very much for your reply. I want to ask one more question, how do I evaluate the effectiveness of our genome assembly? At present, I want to use Busco, but I find this software very difficult to use. Is there any other method or script for this?
Thank you very much for your reply. I want to ask one more question, how do I evaluate the effectiveness of our genome assembly? At present, I want to use Busco, but I find this software very difficult to use. Is there any other method or script for this?
Probably Hi-C is useful to evaluate completeness and misassemblies.
Mercury is also quite popular for assembly evaluation: https://github.com/marbl/merqury