gatk icon indicating copy to clipboard operation
gatk copied to clipboard

BAM versus CRAM index extension pattern

Open marchoeppner opened this issue 5 years ago • 4 comments

GATK Version 4.0.9.0

What is the issue: When using the -OBI flag in e.g. ApplyBQSR, GATK uses an inconsistent pattern for adding index extensions

When using BAM format as output: my_file.bam my_file.bai

When using CRAM format as output: my_file.cram my_file.cram.bai

Since e.g. samtools uses the second pattern for both BAM and CRAM (well, actually they have .crai, but that is a different discssion), I think it would be sensible to adopt that schema. It's not a huge deal, but I tripped over it when writing a pipeline where the user could specify the desired output format - and noticed that there is this odd difference.

marchoeppner avatar Oct 11 '18 09:10 marchoeppner

@marchoeppner I feel you're pain. It's really annoying and causes all sorts of irritation to have different schemes for labelling the index. I think unfortunately we may be stuck with the world where we have to expect both index schemes for bam. I believe we've clamped down for cram and future formats and only excepting the .cram.crai version.

(cram.bai is a different species of horrible that hopefully will hopefully go away soon.)

lbergelson avatar Oct 17 '18 17:10 lbergelson

Bumping this, is it possible for GATK4 tools to also accept .bam.bai (as well as the standard ^.bai for backwards compatibility). Potentially a flag to output in this format too? Trying for consistency in our approach to indexed bam files, but GATK seems to be the major one that's not accepting our outputting this format.

illusional avatar Dec 16 '19 04:12 illusional

@illusional You can specify an alternate index path for bam files in most tools using the --read-index argument.

lbergelson avatar Dec 16 '19 16:12 lbergelson

This also is the case for Markduplicates with output cram and cram.bai

FriederikeHanssen avatar Sep 06 '22 16:09 FriederikeHanssen