gem3-mapper
gem3-mapper copied to clipboard
Parsing FASTA/FASTQ error.
Hi, GEM developers, I am running into a parsing FASTA/FASTQ error using gem-mapper with a pair of FASTQ files. It does not seem to be the issue of the FASTQ files themselves because I am able to align them using BWA without any error. Also when I split the FASTQ files into smaller chunks (like 32GB or less for each chunk), GEM aligns each chunk without any error. The error messages are like the following:
2021/11/24 10:45:00 -- # 146400000 sequences processed
GEM::FatalError (input_fasta.c:299,input_fasta_parser_prompt_error)
Parsing FASTA/FASTQ error(R1.fq:3473248862119260163). Beginning Symbol ('>' or '@') not found. Bad syntax
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
GEM::Unexpected error occurred. Sorry for the inconvenience
Feedback and bug reporting it's highly appreciated,
=> Please report or email ([email protected])
GEM::Running-Thread (threadID = 14)
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
GEM::Version v3.6.0-bundle-release
GEM::CMD gem-mapper -I human_g1k_v37_decoy_phiXAdaptr.gem -1 R1.fq -2 R2.fq -t 18 -r @RG ID:sample PL:ILLUMINA LB:sample SM:sample CN:sample PU:sample
Another interesting thing is that when I launch a different run using the same FASTQ files, the number of sequences processed in the log messages right before the error is bit different; even thought they seem to be between 144,000,000 to 150,000,000.
There seems to be a bug on the parsing module with very large FASTQ files. We will investigate it. Thanks for the report.
I've used gem-mapper to align many other very large FASTQ files from human whole genome sequencing but they didn't cause any error. I can share these two FASTQ files if that will help you reproducing the problem and debugging the code.
Well, sure. If you can make those files available to us, it would be of great help. Thanks,