bowtie2
bowtie2 copied to clipboard
Bug: Bowtie2 crashes on Zstandard-compressed FASTQ files with high compression level
Using latest Bowtie2 f515741, including #402, FASTQ files compressed using Zstandard are crashing with error:
Error, fewer reads in file specified with -1 than in file specified with -2
terminate called after throwing an instance of 'int'
(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)
It seems this is dependent on the compression level used to create the zst files, i.e. with low level (3) Bowtie2 finishes while with higher levels (18 or 19), it crashes. I have seen the problem on many of my FASTQ files. When it crashes seems to depend on file size, read length and compression level.
Bowtie2 was compiled using make -j 10 WITH_ZSTD=1
.
I managed to reproduce the problem using publicly available FASTQ files using zebrafish GRCz11 index (using Ensembl FASTA).
- https://danio-code.zfin.org/files/annotated_files/ChIP-seq/DCD007385BS/ChIP-seq_Xie_Lab_H3K27me1_0001AS.DCD003249SQ.USERmickael.dong.R1.fastq.gz
- https://danio-code.zfin.org/files/annotated_files/ChIP-seq/DCD007385BS/ChIP-seq_Xie_Lab_H3K27me1_0001AS.DCD003249SQ.USERmickael.dong.R2.fastq.gz
To reproduce:
- Gunzip both files
- Run
zstd -19 -T26 *.fastq
(lower -T if you have less cores) - Run:
./bowtie2 -x danrer_genome_all_ensembl_grcz11/seq \ -1 ChIP-seq_Xie_Lab_H3K27me1_0001AS.DCD003249SQ.USERmickael.dong.R1.fastq.zst \ -2 ChIP-seq_Xie_Lab_H3K27me1_0001AS.DCD003249SQ.USERmickael.dong.R2.fastq.zst \ -S out_test.sam \ --phred33 \ -p 10
Results
Zstd level | Crash | Time |
---|---|---|
19 | yes | 3m43s |
18 | yes | 7m14 |
16 | no | 8m59 |
12 | no | 8m58 |
I am using level 19 to store all our lab FASTQ files, so this is pretty bad. Really appreciate if you could look into it and fix it. I can test proposed solutions.
Hello,
Thank you for pointing this out. I pushed a commit to the bug_fixes
branch that should address the issue.
Thanks. I tested on multiple examples that used to fail. They all pass now. So it seems to be fixed.
When do you plan to get a new release out? [the earlier the better!]
The support for Zstandard is broken for now in the current released version and in the master branch. Merging the bug_fixes
branch and releasing would be super appreciated! Any timeline for that to happen? Thanks so much.
I have merged bug_fixes into the master branch. v2.5.0 will be released this week. Thank you for your patience.