bowtie2 icon indicating copy to clipboard operation
bowtie2 copied to clipboard

Bug: Bowtie2 crashes on Zstandard-compressed FASTQ files with high compression level

Open vejnar opened this issue 2 years ago • 2 comments

Using latest Bowtie2 f515741, including #402, FASTQ files compressed using Zstandard are crashing with error:

Error, fewer reads in file specified with -1 than in file specified with -2
terminate called after throwing an instance of 'int'
(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)

It seems this is dependent on the compression level used to create the zst files, i.e. with low level (3) Bowtie2 finishes while with higher levels (18 or 19), it crashes. I have seen the problem on many of my FASTQ files. When it crashes seems to depend on file size, read length and compression level.

Bowtie2 was compiled using make -j 10 WITH_ZSTD=1.

I managed to reproduce the problem using publicly available FASTQ files using zebrafish GRCz11 index (using Ensembl FASTA).

  • https://danio-code.zfin.org/files/annotated_files/ChIP-seq/DCD007385BS/ChIP-seq_Xie_Lab_H3K27me1_0001AS.DCD003249SQ.USERmickael.dong.R1.fastq.gz
  • https://danio-code.zfin.org/files/annotated_files/ChIP-seq/DCD007385BS/ChIP-seq_Xie_Lab_H3K27me1_0001AS.DCD003249SQ.USERmickael.dong.R2.fastq.gz

To reproduce:

  1. Gunzip both files
  2. Run zstd -19 -T26 *.fastq (lower -T if you have less cores)
  3. Run:
    ./bowtie2 -x danrer_genome_all_ensembl_grcz11/seq \
    -1 ChIP-seq_Xie_Lab_H3K27me1_0001AS.DCD003249SQ.USERmickael.dong.R1.fastq.zst \
    -2 ChIP-seq_Xie_Lab_H3K27me1_0001AS.DCD003249SQ.USERmickael.dong.R2.fastq.zst \
    -S out_test.sam \
    --phred33 \
    -p 10
    

Results

Zstd level Crash Time
19 yes 3m43s
18 yes 7m14
16 no 8m59
12 no 8m58

I am using level 19 to store all our lab FASTQ files, so this is pretty bad. Really appreciate if you could look into it and fix it. I can test proposed solutions.

vejnar avatar Aug 17 '22 19:08 vejnar

Hello,

Thank you for pointing this out. I pushed a commit to the bug_fixes branch that should address the issue.

ch4rr0 avatar Aug 24 '22 15:08 ch4rr0

Thanks. I tested on multiple examples that used to fail. They all pass now. So it seems to be fixed.

When do you plan to get a new release out? [the earlier the better!]

vejnar avatar Aug 24 '22 21:08 vejnar

The support for Zstandard is broken for now in the current released version and in the master branch. Merging the bug_fixes branch and releasing would be super appreciated! Any timeline for that to happen? Thanks so much.

vejnar avatar Oct 17 '22 19:10 vejnar

I have merged bug_fixes into the master branch. v2.5.0 will be released this week. Thank you for your patience.

ch4rr0 avatar Oct 25 '22 14:10 ch4rr0

Fixed in v2.5.0. Thanks so much!

Already updated in AUR

vejnar avatar Nov 01 '22 20:11 vejnar