bedtools2 icon indicating copy to clipboard operation
bedtools2 copied to clipboard

bamtofastq creates duplicate reads

Open joowkim opened this issue 3 years ago • 1 comments

Hello,

I found an issue regarding bamtofastq in bedtools (v2.30.0). When I convert a bam file into a fastq file, it seems to create duplicate reads. You can see that there are duplicate read in the fastq from bedtools. The file size of the fastq from bedtools seems double also.

bedtools-issue

I am attaching a shell script for the reproducibility. The following is the version of tools I use.

bedtools - v2.30.0 samtools - 1.16 htslib - 1.16

set -eou pipefail

# download some bam file from uscs
wget -O test.bam  http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqBg02esG1bAlnRep1.bam

# bam2fq
./bedtools.static.binary bamtofastq -i test.bam -fq bedtools.out.fq
samtools fastq test.bam > samtools.out.fq

# grep count
echo "this is from the bedtools output"
grep "SOLEXA-1GA-2_2_FC20EMB:5:251:979:328" bedtools.out.fq

echo ""
echo "this is from the samtools output"
grep "SOLEXA-1GA-2_2_FC20EMB:5:251:979:328" samtools.out.fq

joowkim avatar Sep 08 '22 14:09 joowkim

Hello bedtools devs,

I also see duplicate reads using v2.30.0. I do not see duplicates using v2.26.0.

I should note that I have an unaligned BAM file, so the issue is not related to the number of mappings.

kcanderson avatar May 01 '23 20:05 kcanderson