SeqPrep
SeqPrep copied to clipboard
Segmentation fault while merging PE reads
Hi, I encounter an issue (exit code 139) with this error message Processing reads... |/tmp/.lsbtmp4010/.lsbatch/1433770267.781893: line 8: 10935 Segmentation fault (core dumped) /nfs/seqdb/production/interpro/development/metagenomics/pipeline/tools/bin/SeqPrep -f ERR884064_1.fastq -r ERR884064_2.fastq -1 ERR884064_1_paired.fastq.gz -2 ERR884064_2_paired.fastq.gz -3 ERR884064_1_unpaired.fastq.gz -4 ERR884064_2_unpaired.fastq.gz -s ERR884064_paired.fastq.gz I checked the read files and they do not contain non-ascii characters and all quality score lines have the same length than the sequence lines. I have successfully ran SeqPrep, with the same parameters, before and since so the installation is correct. Any suggestion as how to successfully merge the files? Thanks Hubert
After investigating the files, the issue was that a few sequences from the _1 file were quite short (<10 nt) while they counterparts from file _2 were significantly longer or also very short. Example from 1 file: @MISEQ .... TGCAGGATATCGCGGCCGT + BCC-@ECFGGD7F7@FE+6 and the counterpart in 2 file: TATCCGTTACCTATCGTCCGCGAGAAAGCTAGTAGACACACAGCACCCAGGCGTGCAAGTCACCTTCAGATGACTACACCGAACCTGGTTAAAAGAGTCTATGGCCACCCCTACTTTAGAGTAAAAAAACCACACCTCTATTGCGCTGGGTACTAGAATAAGCTAACTACCTAGTCCGTTTCCGGCTGACTTTTTTGGGAATAACATACCACCCATCGTGATTACGTTCGCCACCGTTCTACTGCTCTCTTCACTAGGTTTGCACATTGTTTGTTCCCCTATGGCTAATTTATAGAGGACN + -6A,A@D<@,CFC,C,E86C+:++8CE,,C,6,;,,CAFGGCE,,C66DE,,C:@C,9,,,<6,6CE<,,,,<,,C,,<,7++8+B88,AFF@F,,,,:5?B,,AECFG:+AD,5AF9;,?,4,C,,+++488,=+,=,,,73,+6@6+3,26=,@,733=6,,==FCG,@,,45<?6,,1,+***3,95DD,__3__1/,0+;9+80)0A)/4/))..););6;()/2).))./)0);)1474)4?4)6))))(.640)))1)4)).,,()),8((,.8((-)).9)-4...((,!
It is the first time I encountered such issue so I don't know if you are aware, cheers Hubert
Looks like your bcl2fastq job is already doing some kind of trimming for you. This is not expected input for seqprep. Maybe if you have say over bcl2fastq parameters you could turn this off? Not sure which settings or defaults would do this in your version. On Thu, Jul 2, 2015 at 2:23 AM hudenise [email protected] wrote:
After investigating the files, the issue was that a few sequences from the _1 file were quite short (<10 nt) while they counterparts from file _2 were significantly longer or also very short. Example from _1 file: @MISEQ .... TGCAGGATATCGCGGCCGT + BCC-@ECFGGD7F7@FE+6 and the counterpart in _2 file:
TATCCGTTACCTATCGTCCGCGAGAAAGCTAGTAGACACACAGCACCCAGGCGTGCAAGTCACCTTCAGATGACTACACCGAACCTGGTTAAAAGAGTCTATGGCCACCCCTACTTTAGAGTAAAAAAACCACACCTCTATTGCGCTGGGTACTAGAATAAGCTAACTACCTAGTCCGTTTCCGGCTGACTTTTTTGGGAATAACATACCACCCATCGTGATTACGTTCGCCACCGTTCTACTGCTCTCTTCACTAGGTTTGCACATTGTTTGTTCCCCTATGGCTAATTTATAGAGGACN + -6A,A@D<@,CFC,C,E86C+:++8CE,,C,6,;,,CAFGGCE,,C66DE,,C:@C ,9,,,<6,6CE<,,,,<,,C,,<,7++8+B88,AFF@F ,,,,:5?B,,AECFG:+AD,5AF9;,?,4,C,,+++488,=+,=,,,73,+6@6 +3,26=,@,733=6,,==FCG,@,,45<?6,,1,+*_**3,95DD,_3 1/,0+;9+80)0A)/4/))..););6;()/2).))./)0);)1474)4?4)6))))(.640)))1)4)) .,,()),8((,.8((-)).9)-4...((,!
It is the first time I encountered such issue so I don't know if you are aware, cheers Hubert
— Reply to this email directly or view it on GitHub https://github.com/jstjohn/SeqPrep/issues/29#issuecomment-117973836.
Thanks, I will forward your email to the user who generated the sequences submitted to our pipeline, cheers Hubert
On 02/07/2015 14:29, John St. John wrote:
Looks like your bcl2fastq job is already doing some kind of trimming for you. This is not expected input for seqprep. Maybe if you have say over bcl2fastq parameters you could turn this off? Not sure which settings or defaults would do this in your version. On Thu, Jul 2, 2015 at 2:23 AM hudenise [email protected] wrote:
After investigating the files, the issue was that a few sequences from the _1 file were quite short (<10 nt) while they counterparts from file _2 were significantly longer or also very short. Example from _1 file: @MISEQ .... TGCAGGATATCGCGGCCGT + BCC-@ECFGGD7F7@FE+6 and the counterpart in _2 file:
TATCCGTTACCTATCGTCCGCGAGAAAGCTAGTAGACACACAGCACCCAGGCGTGCAAGTCACCTTCAGATGACTACACCGAACCTGGTTAAAAGAGTCTATGGCCACCCCTACTTTAGAGTAAAAAAACCACACCTCTATTGCGCTGGGTACTAGAATAAGCTAACTACCTAGTCCGTTTCCGGCTGACTTTTTTGGGAATAACATACCACCCATCGTGATTACGTTCGCCACCGTTCTACTGCTCTCTTCACTAGGTTTGCACATTGTTTGTTCCCCTATGGCTAATTTATAGAGGACN + -6A,A@D<@,CFC,C,E86C+:++8CE,,C,6,;,,CAFGGCE,,C66DE,,C:@C ,9,,,<6,6CE<,,,,<,,C,,<,7++8+B88,AFF@F ,,,,:5?B,,AECFG:+AD,5AF9;,?,4,C,,+++488,=+,=,,,73,+6@6 +3,26=,@,733=6,,==FCG,@,,45<?6,,1,+*_**3,95DD,_3 1/,0+;9+80)0A)/4/))..););6;()/2).))./)0);)1474)4?4)6))))(.640)))1)4)) .,,()),8((,.8((-)).9)-4...((,!
It is the first time I encountered such issue so I don't know if you are aware, cheers Hubert
— Reply to this email directly or view it on GitHub https://github.com/jstjohn/SeqPrep/issues/29#issuecomment-117973836.
Reply to this email directly or view it on GitHub: https://github.com/jstjohn/SeqPrep/issues/29#issuecomment-118033674
Dr Hubert DENISE
Metagenomics European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom Tel : (+44)01223 494102
Hello, I am using SeqPrep after Trimmomatic (which trims the reads) and I am experiencing for some files this error: /tmp/sge_spool/lhi10/job_scripts/19611347: line 20: 15553 Segmentation fault (core dumped) SeqPrep -f trimmomatic_input_1P.fastq -r trimmomatic_input_2P.fastq -1 seqprep_1_trimmed.fastq.gz -2 seqprep_2_trimmed.fastq.gz -3 seqprep_1_notmerged.fastq.gz -4 seqprep_2_notmerged.fastq.gz -A AGATCGGAAGAGCACACGTCT -B AGATCGGAAGAGCGTCGTGTA -L 20 -o 40 -s seqprep_merged.fastq.gz -2 file_seqprep.txt.gz 2>> seqprep.log
Above, you mention that the segmentation fault may be explained by the fact that the read pairs in the input file to SeqPrep may not have the same length. Does this mean Trimmomatic should not be used prior to SeqPrep and that the program expects the same length of read pairs to work?
Many thanks for you help on this issue Chloé
Dear Chloe, Indeed we're using SeqPrep upstream of Trimmomatic on the raw reads with just the primer/adapter removed. Then we apply Trimmomatic on the merged file. Sincerely, Hubert
On 15/11/2016 09:57, chloeloiseau wrote:
Hello, I am using SeqPrep after Trimmomatic (which trims the reads) and I am experiencing for some files this error: /tmp/sge_spool/lhi10/job_scripts/19611347: line 20: 15553 Segmentation fault (core dumped) SeqPrep -f trimmomatic_input_1P.fastq -r trimmomatic_input_2P.fastq -1 seqprep_1_trimmed.fastq.gz -2 seqprep_2_trimmed.fastq.gz -3 seqprep_1_notmerged.fastq.gz -4 seqprep_2_notmerged.fastq.gz -A AGATCGGAAGAGCACACGTCT -B AGATCGGAAGAGCGTCGTGTA -L 20 -o 40 -s seqprep_merged.fastq.gz -2 file_seqprep.txt.gz 2>> seqprep.log
Above, you mention that the segmentation fault may be explained by the fact that the read pairs in the input file to SeqPrep may not have the same length. Does this mean Trimmomatic should not be used prior to SeqPrep and that the program expects the same length of read pairs to work?
Many thanks for you help on this issue Chloé
Dr Hubert DENISE
Metagenomics European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom Tel : (+44)01223 494102