bamtools
bamtools copied to clipboard
Invalid bai file for some BAM files
I'm having issues with some publicly available encode BAM files. It appears that perhaps the bai file being created by BamTools is broken. Not sure if the BAM file is bad or there is some bug in BamTools. Here are the steps to reproduce this issue:
I downloaded the following BAM file and BAI file:
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai
Performing a count on these files gives reasonable results:
$ bamtools count -in ./og/wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam -region chr8:128746973-128755020
90772
However if I create a new index file using the latest version of bamtools (May 31, 2012) then peform the same operation teh count returns 0 results:
$ mv wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai.bak
$ bamtools index -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam
$ bamtools count -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam -region chr8:128746973-128755020
0
Indeed if I examine the sizes of the origina and new index files they are not the same.
4036792 May 31 11:55 wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai
4038128 May 21 10:15 wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai.bak
If however, I sort and then index the BAM file (again using bamtools) the count will work.
$ bamtools sort -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam -out wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2_bt_sorted.bam
$ bamtools index -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2_bt_sorted.bam
$ bamtools count -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2_bt_sorted.bam -region chr8:128746973-128755020
90772
Perhaps the original bam file was sorted incorrectly resulting in an invalid bai file OR sorting the bam file fixes some irregularity with the encode bam file. Either way the BamTools indexing should at least show an error. I hope this helps.
Yeah, if it's working after re-sorting, then it sounds like there may have been an error upstream in the whoever sorted the original file.
But I agree - there should be improved error reporting, assuming it can be properly detected/identified, from the command-line tool. Thanks for the heads-up; I'm making a note to look into this.
I'm wondering how the original bai file was created. SamTools also fails to index this bam file.
Bamtools couldn't handle .bai files properly. It is not due to the problem of bam file itself. but .bti seems to work fine. Also, BamReader .Close() function seems to be inapropriately implemented. It returns void instead of bool value as in the documentation. However, after BamReader reader.Close() reader cannot be directly reused to open other files, which results in an error. This is a big issue for API and requires higher priority of fix.
@isuxyang - BamReader::Close() certainly returns a boolean return value and you should be able to reuse the reader (I have no trouble with this in my code). Are you sure that you that have an up-to-date version of the API?
If not, please update and see if your problem is solved. If not (it's up-to-date and the issue remains), can you provide more info about it not handling .bai file "properly"? E.g. What are you expecting? What are you actually seeing? Any error messages (from BamTools API or compile time/runtime)?