Add support for interleaved FASTQ files
Thanks for developing this! This will certainly allow us to confidently delete original data files by first verifying data integrity.
I do have a feature request: it would be useful to have support for interleaved FASTQ files (where read pairs are consecutive in the file; compatible with BWA MEM).
Also, perhaps once this is implemented (if you choose to do so), could you create a new release with all the commits since v1.0? Thanks.
I just took a look at this and it shouldn't be too much work. This would only apply for FASTQ files and not the FASTA.
After this I'll tag a new release as well.
The interleaved branch has the ability to work with interleaved reads. I've also added the checks to the test directory and it works.
@brunogrande: I would appreciate it if you could try this on your data, just so that I'm sure it works in the wild. Then I can tag a new release.
@pmelsted: We have tested the interleaved branch on our interleaved FASTQ files and the hash sums check out (see below).
One thing though: I'm supposing that the associated number with each hash sum is the number of reads or read pairs. I would expect these to be the same.
BAM:
93f3169cad3f0047 98551238
FASTQ (after BAM-to-FASTQ conversion):
93f3169cad3f0047 49275619
Also, thanks for this, Pall!