HMMRATAC icon indicating copy to clipboard operation
HMMRATAC copied to clipboard

BR: Invalid file header in BAM index

Open tahia opened this issue 4 years ago • 7 comments

Hi there

I'm running HMMRATAC_V1.2.10_exe.jar to call peak for ATAC data. I have 4 replicates so I merged, sorted and indexed bam file and then went to call peak by HMMRATAC.

This is a redhat-release-server-7.8-2.el7.x86_64 and I'm running openjdk version "11.0.7" 2020-04-14 LTS.

Here is the command I'm running:

java -Xmx12G -jar /path/to/HMMRATAC/HMMRATAC_V1.2.10_exe.jar -b /path/to/bam/file/C_F_sorted.bam -i /path/to/bam/file/C_F_sorted.bai -g path/to/genInfo/genome.info -o /path/to/output/C_F 1> logs/C_F.log

I'm getting the following error:

Exception in thread "main" java.lang.RuntimeException: Invalid file header in BAM index /home/taslima/Data/PH/ATACSEQ_EXP/ATACSEQ/REFHAL/RHAL_Merge/C_F_sorted.bai: ^_^D at net.sf.samtools.AbstractBAMFileIndex.(AbstractBAMFileIndex.java:90) at net.sf.samtools.DiskBasedBAMFileIndex.(DiskBasedBAMFileIndex.java:46) at net.sf.samtools.BAMFileReader.getIndex(BAMFileReader.java:232) at net.sf.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:592) at net.sf.samtools.BAMFileReader.query(BAMFileReader.java:352) at net.sf.samtools.SAMFileReader.query(SAMFileReader.java:363) at HMMR_ATAC.pullLargeLengths.read(pullLargeLengths.java:112) at HMMR_ATAC.pullLargeLengths.(pullLargeLengths.java:61) at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:219)

Any idea why I'm getting this error?

Thanks!

tahia avatar May 19 '20 23:05 tahia

can you share the commands used to create the merged and sorted BAM file and the commands for the index file?

EvanTarbell avatar May 21 '20 13:05 EvanTarbell

Hi Evan

I think I figured it out. If I rename my index file as "base.bam.bai" and run the command I mentioned earlier, it will provide the output. These are biological replicates and I wanted to run it individually so didn't merge. Here are the commands I was running to create sorted BAM file.

Remove Duplicates: java -Xms1G -Xmx4G -jar /path/picard.jar MarkDuplicates MAX_RECORDS_IN_RAM=4000000 INPUT=/path/C1-F_RGP.bam OUTPUT=/path/C1-F_markdup.bam METRICS_FILE=/home/taslima/Data/PH/ATACSEQ_EXP/ATACSEQ/REFHAL/BWA_FIL_RHAL_RMDUP/C1-F_markdup.txt REMOVE_DUPLICATES=true 1>logs/C1-F.log

Samtools sort: samtools sort -@ 4 -m 2G -o /path/C1-F_sorted.bam /path/C1-F_markdup.bam > logs/C1-F_samsort.log

Samtools index samtools index -@ 4 -m 2G /path/C1-F_sorted.bam /path/C1-F_sorted.bai > logs/C1-F_index.log

tahia avatar May 21 '20 23:05 tahia

Sounds like ensuring the file is suffixed with .bam.bai is required for the BAM file reader i use. I'll add that to the user guide so other users don't run into the same problem.

EvanTarbell avatar May 22 '20 13:05 EvanTarbell

Hi Evan

It looks like the bug is fixed for my local machine (ubuntu release 18.04, openjdk 11.0.7) but not that redhat server I mentioned in my first bug report. I'm testing the identical file with same command so I'm not sure what's going on. Any idea?

tahia avatar May 23 '20 05:05 tahia

Hi @tahia and @EvanTarbell I am facing the same problem. How did you solve it? I am using ubuntu 20.04 openJDK 11.0.10. I already have the file as .bam.bai

I also used Picard to remove the duplicated. and samtools to remove genes from the mitochondrial chromosome. I have tried creating fresh index, also again sorting the file.

pratarora avatar Mar 15 '21 14:03 pratarora

For others facing the same problem, if you are making the index file using multithreading, try making the index without multithreading.

pratarora avatar Mar 30 '21 07:03 pratarora

Check out #84 where @pratarora found that making the .bai index file with the multithreading option in samtools could cause this error

EvanTarbell avatar Mar 30 '21 13:03 EvanTarbell