HMMRATAC icon indicating copy to clipboard operation
HMMRATAC copied to clipboard

BR: HMMRATAC fails to run on large genomes needing .csi index

Open TeiturAK opened this issue 2 years ago • 4 comments

Describe the bug I'm running HMMRATAC on several plant genomes which vary greatly in size. HMMRATAC fails when running on the largest genomes that require a .csi index. It produces the following error:

Exception in thread "main" java.lang.RuntimeException: Invalid file header in BAM index spruce.sorted.unique_mapped.MT_CP_removed.bam.csi: ^_^D

It works fine on the smaller genomes for which I can generate a .bai index.

System:

  • OS: Linux
  • HMMRATAC Version 1.2.10

Additional context I'm working with a ~20GB genome that requires a .csi index. I did not use multithreading when creating the index and just changing the name of the index to have a .bai ending does not help.

Any help would be much appreciated. Teitur

TeiturAK avatar May 01 '22 07:05 TeiturAK

@TeiturAK I see no reference of HMMRATAC being able to process .csi files. Why did you think this should work?

Mouwrice avatar May 03 '22 12:05 Mouwrice

The internal samtools dependency is using a reader implementation that has been deprecated for years and does not seem to support .csi index files. The dependency should be update to the latest release and the code refactored to use the new reader implementation.

Mikxox avatar May 12 '22 13:05 Mikxox

@TeiturAK is it possible to share a .csi index file? I would like to implement this feature/ fic the bug and test whether it works. I'm but a noble computer science student and have no idea where to find a .csi file to test this feature. After this project is done I will contact you again so I can share the implementation of course :slightly_smiling_face: .

jitsedesmet avatar May 28 '22 08:05 jitsedesmet

I have been able to generate a csi file myself and my implementation seems to work for me. I will link my implementation once it can be made public here. You are still welcome to provide me with your data so I can make sure it works. (Although I understand that sharing data in some fields is not trivial in which case I hope it'll work for you) :smile:

jitsedesmet avatar May 28 '22 14:05 jitsedesmet