bedtools2
bedtools2 copied to clipboard
bamtobed -bedpe bug results in additional entry in BED
bamtobed adds an additional entry that does not exists in the BAM file. This seems to occur when the last read is unpaired and the previous read is paired in a name sorted BAM containing paired-end data.
Here is a simple name sorted test SAM (with 2 paired reads and 1 orphan read) to reproduce the error
@HD VN:1.6 SO:queryname
@SQ SN:chr1 LN:248956422
A00000:000:H00AAAAAA:1:2488:32949:32283 83 chr1 153875873 60 75M = 153875807 -141 GACCCTCACCCCTCCACACCATCACCCCTCACCCACTTATAGTTTTCTCAGCTTCTTCATCCATCCCCTTCCTGN ,FFFFFFFF:FFFFFFFF,F,F:FFF:FFFFF:F,FF,FF:FFFFFFF:FFFFF,FF:FFF:,F:FF:FFF:FF#
A00000:000:H00AAAAAA:1:2488:32949:32283 163 chr1 153875807 60 73M = 153875873 141 AGTCATAGACGNATCCCNCTCCGNCGGGAAGANCATTCCCGCTATACAACTTCAAAGTAACGAGACAACTNAG FFFFF##FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF#FFFFF#FFFFFFFFFFFFFFFFF:FFFFFFFFF#
A00000:000:H00AAAAAA:1:2488:32949:32440 81 chr1 2376163 60 74M = 2375914 -323 TACGAATCACTACGACCGCTCCTAGGCCNAGCTGCCGCAGCTTATCTCACAATCGGCATGCTCTACTTTGACTC FFFFFFFFFFFFFFF,FFF,F,FFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFF:FFFFFFFFFFF#
I convert to BAM and run bedtools bamtobed like:
samtools view -hb test.sam > test.small.bam
bedtools bamtobed -bedpe -I test.small.bam > test.small.bed
which gives the following output:
cat test.small.bed
chr1 153875806 153875879 chr1 153875872 153875947 A00000:000:H00AAAAAA:1:2488:32949:32283 60 + -
chr1 153875806 153875879 chr1 153875806 153875879 A00000:000:H00AAAAAA:1:2488:32949:32283 60 + +
The expected output is, of course, only the paired read:
chr1 153875806 153875879 chr1 153875872 153875947 A00000:000:H00AAAAAA:1:2488:32949:32283 60 + -
Hence bamtobed give the message:
*****WARNING: Query A00000:000:H00AAAAAA:1:2488:32949:32440 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
As you can see, there is an additional entry in the BED.
bedtools --version bedtools v2.30.0