htsjdk
htsjdk copied to clipboard
SAMFileWriterFactory creates .bai file when writing .cram file
Description of the issue:
When using the SAMFileWriterFactory
to write a .cram
file, when the "create index" default is toggled on, it will create a .bai
file for the index rather than .crai
. This means that e.g. when running gatk MergeSamFiles --CREATE_INDEX…
with a .cram
output, you end up with an output.cram.bai
file instead of output.cram.crai
.
Your environment:
- version of htsjdk: 3.0.1
- version of java: 17
- which OS: MacOS
Steps to reproduce
Run gatk MergeSamFiles
as described above.
Expected behaviour
You should get a .crai
file.
Actual behaviour
You get a .bai
file.
There are a few very old issues surrounding .crai
files in the repo. According to this issue it seems like support was added for this but kept off for reasons discussed here. Perhaps it's too much to resurrect the project of getting these indices sorted out, but at the moment is seems GATK just silently puts out .cram.bai
files due to this, which can be pretty confusing. I don't know enough about CRAM vs BAM to know how bad it might be to use one index for the other, but at least GATK seems to work just fine doing random access on CRAMs with the .bai
file produced as described above. Also not sure if this issue should be pushed up to GATK or kept down here in htsjdk. At the very least it'd be nice if the library could be updated to use the proper file extension for the index.
@rickymagner It's actually producing a bai index, not a crai. So it would be equally wrong to rename it to crai. It would be great to fix it to make a crai index but I think it's a bit of a project.