hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

ERROR6 and others probably no warning

Open schellt opened this issue 1 year ago • 6 comments

Dear all, after using hifiasm for a long time, we recently observed strange behavior for one assembly in particular.

When running hifiasm 0.19.8, the tool returns ERROR6 and other messages:

$ grep "ERROR" hifiasm-0.19.8.err
ERROR6
ERROR6
ERROR6
ERROR6
ERROR-r-break
ERROR-read

Of cause we saw the other issues, where you state not to worry about it but we are concerned that there is indeed some mistakes happening - at least in our case.

We are running hifiasm 0.19.8 with HiC data and HiFi reads fom two PacBio Revio SMRT cells with default parameters. The job is submitted via slurm to a compute node. First we thought this might be an issue related to RAM but when investigating, the maximum of used RAM is around 233Gb, whereas the allocated RAM is 900Gb. hifiasm-error6

Interestingly, when running an assembly for each data set of the both SMRT cells separately, there is no ERROR6 and others.

The problems unfortunately don't end here. When running a reference based annotation with TOGA (https://github.com/hillerlab/TOGA) the first haplotype looks reasonable good but the second haplotype is lacking around 1500 genes we are expecting. By investigating whole genome alignments, it seems that there is sequence actually missing in haplotype two.

I would be very happy, if you could have a look at this. As well we are open to share the data with you for further investigation.

Thank you very much in advance. Best, Tilman

schellt avatar Dec 19 '23 14:12 schellt

In general, I feel like these are all warnings so it doesn't matter too much. I guess there might be some other issues. If you could share the bin files with me, that would be very helpful. Thanks so much!

chhylp123 avatar Dec 22 '23 13:12 chhylp123

Thanks for offering to have a look at this. I sent you a mail to [email protected] with details how to access the files.

schellt avatar Dec 27 '23 18:12 schellt

Thanks so much!

chhylp123 avatar Jan 02 '24 13:01 chhylp123

Dear @chhylp123 , did you had a chance to look at the bin files? Thank you. Best, Tilman

schellt avatar Jan 30 '24 13:01 schellt

Dear @chhylp123, it would be great, if you could have a look at this. Find below some screenshots of example locations of the de novo assembled haplotype 1 with genes missing in haplotype 2. Both top figures are screenshots from an alignment to hg38 and below screenshots from IGV of corresponding regions in the assembled haplotype 1. It's unfortunately not possible to display the whole range of the regions given for the alignments in IGV. For me the coverage looks not suspicious here, which might point towards some assembly issue.

hg38 chr19:5,257,814-6,757,813 rougly corresponding to HLeliQue1A coords HAP1_SUPER_16:353,892-1,039,785 hap1_super16 IGV screenshot for HAP1_SUPER_16:674,008-719,670 image

chr19:54,345,842-54,560,522 corresponding to 1A HAP1_SUPER_19:64,995,898-65,223,886 hap1_super_19 IGV screenshot for HAP1_SUPER_19:65,089,716-65,130,068 image

Thank you very much in advance. Best, Tilman

schellt avatar Mar 04 '24 17:03 schellt

Dear Heng @lh3 and developers,

we think this is really a HiFiasm bug that may have gone unnoticed, because people always expect the second haplotype is less complete as it lacks the sex chromosomes. However, in our case, we have autosomes and regions with many genes that have normal read coverage. Meaning, while the reads suggest that these autosomal regions should be present in both haplotypes, the haplotype 2 lacks them entirely.

Would be great if somebody could look into this. We are happy to share all data.

Thx a lot Michael

MichaelHiller avatar Mar 05 '24 06:03 MichaelHiller