BamHash icon indicating copy to clipboard operation
BamHash copied to clipboard

FASTQ read name/description when computing hash

Open guillermo-carrasco opened this issue 8 years ago • 1 comments

Hi,

I have a question regarding how you use the FASTQ description field to calculate the hash. The question arises because I've been doing some tests and I only manage to get the same md5 using the -R option in both the BAM file and the original FASTQ files.

More details following:

I've trimmed a FASTQ sample for testing purposes, the reads look like this:

~> zcat ../data/NA12878_trimmed_1.fastq.gz | head -n 4
@ERR194147.1 HSQ1004:134:C0D8DACXX:1:1104:3874:86238/1
GGTTCCTACTTCAGGGTCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCACGATGGATCACAGGTCTATCACCCTATTAACCACTCACG
+
CC@FFFFFHHHHHJJJFHIIJJJJJJIHJIIJJJJJJJJIIGIJJIJJJIJJJIJIJJJJJJJJJJIJHHHHFFFDEEEEEEEEDDDCDDEEDDDDDDDDD

When, after the analysis, I run bamhash_checksum_fastq in the FASTQ files and bamhash_checksum_bam in the resulting BAM file I get different md5's:

~> bamhash_checksum_fastq ../data/NA12878_trimmed_*
a05de49644a0fb5d        10000

~> bamhash_checksum_bam final/NA12878_trimmed/NA12878_trimmed-ready.bam
d4d5ece0f619d83d        20000

If I convert the BAM file back to FASTQ I realised that the FASTQ read description disappears, i.e:

~> samtools fastq final/NA12878_trimmed/NA12878_trimmed-ready.bam | head -n 4
@ERR194147.6389/2
CATCGGATTTTTGTTTTTTTTGTTTTGGGTGGGGGGGGTTGGTGGGGTTGTGTGTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTGGGGGGGGTGGTTGG
+
11++40+2)2,+)2:3AA8)))00)))((('''''&&&))&&(((&&&&&&(((++(&05;BB7@>B@BDBBDDB>BDD@@3>9<&5-&&&&&&)&&)+(&

Is that description after the readname used to calculate the hash? I'm pretty confident that this is the problem, since if I run BamHash with -R it does return the expected result:

~> bamhash_checksum_fastq -R ../data/NA12878_trimmed_1.fastq.gz ../data/NA12878_trimmed_2.fastq.gz
f4524c00c70e9b83        10000

~> bamhash_checksum_bam -R final/NA12878_trimmed/NA12878_trimmed-ready.bam
f4524c00c70e9b83        20000

Thanks for your help!

guillermo-carrasco avatar Jan 18 '16 14:01 guillermo-carrasco