gem3-mapper icon indicating copy to clipboard operation
gem3-mapper copied to clipboard

Parsing FASTA/FASTQ error.

Open Luobiny opened this issue 3 years ago • 3 comments

Hi, GEM developers, I am running into a parsing FASTA/FASTQ error using gem-mapper with a pair of FASTQ files. It does not seem to be the issue of the FASTQ files themselves because I am able to align them using BWA without any error. Also when I split the FASTQ files into smaller chunks (like 32GB or less for each chunk), GEM aligns each chunk without any error. The error messages are like the following:

   2021/11/24 10:45:00 -- # 146400000 sequences processed
    GEM::FatalError (input_fasta.c:299,input_fasta_parser_prompt_error)
     Parsing FASTA/FASTQ error(R1.fq:3473248862119260163). Beginning Symbol ('>' or '@') not found. Bad syntax
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    GEM::Unexpected error occurred. Sorry for the inconvenience
         Feedback and bug reporting it's highly appreciated,
         => Please report or email ([email protected])
    GEM::Running-Thread (threadID = 14)
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    GEM::Version v3.6.0-bundle-release
    GEM::CMD gem-mapper -I human_g1k_v37_decoy_phiXAdaptr.gem -1 R1.fq -2 R2.fq -t 18 -r @RG        ID:sample       PL:ILLUMINA     LB:sample       SM:sample       CN:sample PU:sample
    

Another interesting thing is that when I launch a different run using the same FASTQ files, the number of sequences processed in the log messages right before the error is bit different; even thought they seem to be between 144,000,000 to 150,000,000.

Luobiny avatar Nov 24 '21 21:11 Luobiny

There seems to be a bug on the parsing module with very large FASTQ files. We will investigate it. Thanks for the report.

smarco avatar Nov 25 '21 09:11 smarco

I've used gem-mapper to align many other very large FASTQ files from human whole genome sequencing but they didn't cause any error. I can share these two FASTQ files if that will help you reproducing the problem and debugging the code.

Luobiny avatar Nov 25 '21 18:11 Luobiny

Well, sure. If you can make those files available to us, it would be of great help. Thanks,

smarco avatar Nov 26 '21 13:11 smarco