bamtools icon indicating copy to clipboard operation
bamtools copied to clipboard

Invalid bai file for some BAM files

Open Hypercubed opened this issue 13 years ago • 4 comments

I'm having issues with some publicly available encode BAM files. It appears that perhaps the bai file being created by BamTools is broken. Not sure if the BAM file is bad or there is some bug in BamTools. Here are the steps to reproduce this issue:

I downloaded the following BAM file and BAI file:

http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai

Performing a count on these files gives reasonable results:

$ bamtools count -in ./og/wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam -region chr8:128746973-128755020
90772

However if I create a new index file using the latest version of bamtools (May 31, 2012) then peform the same operation teh count returns 0 results:

$ mv wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai.bak
$ bamtools index -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam
$ bamtools count -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam -region chr8:128746973-128755020
0

Indeed if I examine the sizes of the origina and new index files they are not the same.

4036792 May 31 11:55 wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai
4038128 May 21 10:15 wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai.bak

If however, I sort and then index the BAM file (again using bamtools) the count will work.

$ bamtools sort -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam -out wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2_bt_sorted.bam
$ bamtools index -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2_bt_sorted.bam
$ bamtools count -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2_bt_sorted.bam -region chr8:128746973-128755020
90772

Perhaps the original bam file was sorted incorrectly resulting in an invalid bai file OR sorting the bam file fixes some irregularity with the encode bam file. Either way the BamTools indexing should at least show an error. I hope this helps.

Hypercubed avatar May 31 '12 09:05 Hypercubed

Yeah, if it's working after re-sorting, then it sounds like there may have been an error upstream in the whoever sorted the original file.

But I agree - there should be improved error reporting, assuming it can be properly detected/identified, from the command-line tool. Thanks for the heads-up; I'm making a note to look into this.

pezmaster31 avatar May 31 '12 19:05 pezmaster31

I'm wondering how the original bai file was created. SamTools also fails to index this bam file.

Hypercubed avatar Jun 01 '12 05:06 Hypercubed

Bamtools couldn't handle .bai files properly. It is not due to the problem of bam file itself. but .bti seems to work fine. Also, BamReader .Close() function seems to be inapropriately implemented. It returns void instead of bool value as in the documentation. However, after BamReader reader.Close() reader cannot be directly reused to open other files, which results in an error. This is a big issue for API and requires higher priority of fix.

eksyang avatar Sep 24 '12 15:09 eksyang

@isuxyang - BamReader::Close() certainly returns a boolean return value and you should be able to reuse the reader (I have no trouble with this in my code). Are you sure that you that have an up-to-date version of the API?

If not, please update and see if your problem is solved. If not (it's up-to-date and the issue remains), can you provide more info about it not handling .bai file "properly"? E.g. What are you expecting? What are you actually seeing? Any error messages (from BamTools API or compile time/runtime)?

pezmaster31 avatar Sep 24 '12 16:09 pezmaster31