BamHash icon indicating copy to clipboard operation
BamHash copied to clipboard

Add support for interleaved FASTQ files

Open BrunoGrandePhD opened this issue 10 years ago • 4 comments

Thanks for developing this! This will certainly allow us to confidently delete original data files by first verifying data integrity.

I do have a feature request: it would be useful to have support for interleaved FASTQ files (where read pairs are consecutive in the file; compatible with BWA MEM).

Also, perhaps once this is implemented (if you choose to do so), could you create a new release with all the commits since v1.0? Thanks.

BrunoGrandePhD avatar Oct 14 '15 04:10 BrunoGrandePhD

I just took a look at this and it shouldn't be too much work. This would only apply for FASTQ files and not the FASTA.

After this I'll tag a new release as well.

pmelsted avatar Nov 10 '15 21:11 pmelsted

The interleaved branch has the ability to work with interleaved reads. I've also added the checks to the test directory and it works.

@brunogrande: I would appreciate it if you could try this on your data, just so that I'm sure it works in the wild. Then I can tag a new release.

pmelsted avatar Nov 11 '15 14:11 pmelsted

@pmelsted: We have tested the interleaved branch on our interleaved FASTQ files and the hash sums check out (see below).

One thing though: I'm supposing that the associated number with each hash sum is the number of reads or read pairs. I would expect these to be the same.

BAM:
93f3169cad3f0047        98551238

FASTQ (after BAM-to-FASTQ conversion):
93f3169cad3f0047        49275619

BrunoGrandePhD avatar Nov 17 '15 00:11 BrunoGrandePhD

Also, thanks for this, Pall!

BrunoGrandePhD avatar Nov 17 '15 00:11 BrunoGrandePhD