hifiasm
hifiasm copied to clipboard
No log file and unclear data
I am using PacBio Hifi yeast genomes for this assembly and I would like to understand what the data is telling me: if the sequencing job was bad, if my data isn't good, if everything is normal, if this is a good tool to use for my data moving forward, etc..
However my run did not produce a log file and I'm having trouble interpreting these results.
This was the line of code I ran: ./hifiasm -o 088_out.asm 088.fastq.gz
And this is what the 088_out.asm.bp.p_utg.gfa
file looks like when I visualize it using Bandage:
If someone could please let me know what these graphs mean and if there is a log file hidden somewhere that would be great!
The log file will be output to the stderr
. Could you please show the assembly metrics of p_ctg.gfa
?
Thank you! I had to exit my terminal once I ran the code so I missed going back to look at the kmer plots on the stderr
This is what it looked like when I re-ran the code:
This was the first round:
And the last round:
Do these look like good/normal kmer plots?
My assembly metrics were: Nodes: 70 Edges: 1 Total length: 16,052,665 bp
I believe that is what you are asking for? Or is it something else?
The log file looks fine. Both the N50 and the assembly size are useful to check the results. A good assembly should have a similar size to the estimated genome, and the N50 will be tens of Mb.
Glad to hear the log file looks fine. This is a yeast genome - which is normally about 12Mb - but I believe it is also tetraploid meaning it has 4 chromosomes which would mean it would have a genome size of 48Mb. Seeing as this is only 16Mb should I be worried it's not the entire sequence? Or do I only have to consider the haplotype (1 chromosome) which would mean that its longer than expected (12Mb vs 16Mb). The N50 size is 542,165 bp.
Sorry I missed your issue... It depends on which assembly you are using (see: https://hifiasm.readthedocs.io/en/latest/interpreting-output.html). Ideally, the p_ctg
should be around 12Mb, and the a_ctg
should be a little bit smaller than 36Mb if your sample is tetraploid. p_ctg
should represent the entire sequence of one haplotype.