hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

No log file and unclear data

Open rmormando opened this issue 1 year ago • 5 comments

I am using PacBio Hifi yeast genomes for this assembly and I would like to understand what the data is telling me: if the sequencing job was bad, if my data isn't good, if everything is normal, if this is a good tool to use for my data moving forward, etc..

However my run did not produce a log file and I'm having trouble interpreting these results.

This was the line of code I ran: ./hifiasm -o 088_out.asm 088.fastq.gz

And this is what the 088_out.asm.bp.p_utg.gfa file looks like when I visualize it using Bandage: p_utg

If someone could please let me know what these graphs mean and if there is a log file hidden somewhere that would be great!

rmormando avatar Sep 13 '22 16:09 rmormando

The log file will be output to the stderr. Could you please show the assembly metrics of p_ctg.gfa?

chhylp123 avatar Sep 14 '22 18:09 chhylp123

Thank you! I had to exit my terminal once I ran the code so I missed going back to look at the kmer plots on the stderr

This is what it looked like when I re-ran the code:

This was the first round: Screen Shot 2022-09-14 at 3 33 15 PM

And the last round: Screen Shot 2022-09-14 at 3 33 55 PM

Do these look like good/normal kmer plots?

My assembly metrics were: Nodes: 70 Edges: 1 Total length: 16,052,665 bp

I believe that is what you are asking for? Or is it something else?

rmormando avatar Sep 14 '22 21:09 rmormando

The log file looks fine. Both the N50 and the assembly size are useful to check the results. A good assembly should have a similar size to the estimated genome, and the N50 will be tens of Mb.

chhylp123 avatar Sep 15 '22 00:09 chhylp123

Glad to hear the log file looks fine. This is a yeast genome - which is normally about 12Mb - but I believe it is also tetraploid meaning it has 4 chromosomes which would mean it would have a genome size of 48Mb. Seeing as this is only 16Mb should I be worried it's not the entire sequence? Or do I only have to consider the haplotype (1 chromosome) which would mean that its longer than expected (12Mb vs 16Mb). The N50 size is 542,165 bp.

rmormando avatar Sep 15 '22 14:09 rmormando

Sorry I missed your issue... It depends on which assembly you are using (see: https://hifiasm.readthedocs.io/en/latest/interpreting-output.html). Ideally, the p_ctg should be around 12Mb, and the a_ctg should be a little bit smaller than 36Mb if your sample is tetraploid. p_ctg should represent the entire sequence of one haplotype.

chhylp123 avatar Sep 30 '22 18:09 chhylp123