SeqPrep Segmentation fault while merging PE reads

trafficstars

Hi, I encounter an issue (exit code 139) with this error message Processing reads... |/tmp/.lsbtmp4010/.lsbatch/1433770267.781893: line 8: 10935 Segmentation fault (core dumped) /nfs/seqdb/production/interpro/development/metagenomics/pipeline/tools/bin/SeqPrep -f ERR884064_1.fastq -r ERR884064_2.fastq -1 ERR884064_1_paired.fastq.gz -2 ERR884064_2_paired.fastq.gz -3 ERR884064_1_unpaired.fastq.gz -4 ERR884064_2_unpaired.fastq.gz -s ERR884064_paired.fastq.gz I checked the read files and they do not contain non-ascii characters and all quality score lines have the same length than the sequence lines. I have successfully ran SeqPrep, with the same parameters, before and since so the installation is correct. Any suggestion as how to successfully merge the files? Thanks Hubert

Jun 11 '15 10:06 hudenise

After investigating the files, the issue was that a few sequences from the _1 file were quite short (<10 nt) while they counterparts from file _2 were significantly longer or also very short. Example from 1 file: @MISEQ .... TGCAGGATATCGCGGCCGT + BCC-@ECFGGD7F7@FE+6 and the counterpart in 2 file: TATCCGTTACCTATCGTCCGCGAGAAAGCTAGTAGACACACAGCACCCAGGCGTGCAAGTCACCTTCAGATGACTACACCGAACCTGGTTAAAAGAGTCTATGGCCACCCCTACTTTAGAGTAAAAAAACCACACCTCTATTGCGCTGGGTACTAGAATAAGCTAACTACCTAGTCCGTTTCCGGCTGACTTTTTTGGGAATAACATACCACCCATCGTGATTACGTTCGCCACCGTTCTACTGCTCTCTTCACTAGGTTTGCACATTGTTTGTTCCCCTATGGCTAATTTATAGAGGACN + -6A,A@D<@,CFC,C,E86C+:++8CE,,C,6,;,,CAFGGCE,,C66DE,,C:@C,9,,,<6,6CE<,,,,<,,C,,<,7++8+B88,AFF@F,,,,:5?B,,AECFG:+AD,5AF9;,?,4,C,,+++488,=+,=,,,73,+6@6+3,26=,@,733=6,,==FCG,@,,45<?6,,1,+***3,95DD,__3__1/,0+;9+80)0A)/4/))..););6;()/2).))./)0);)1474)4?4)6))))(.640)))1)4)).,,()),8((,.8((-)).9)-4...((,!

It is the first time I encountered such issue so I don't know if you are aware, cheers Hubert

Jul 02 '15 09:07 hudenise

Looks like your bcl2fastq job is already doing some kind of trimming for you. This is not expected input for seqprep. Maybe if you have say over bcl2fastq parameters you could turn this off? Not sure which settings or defaults would do this in your version. On Thu, Jul 2, 2015 at 2:23 AM hudenise [email protected] wrote:

After investigating the files, the issue was that a few sequences from the _1 file were quite short (<10 nt) while they counterparts from file _2 were significantly longer or also very short. Example from _1 file: @MISEQ .... TGCAGGATATCGCGGCCGT + BCC-@ECFGGD7F7@FE+6 and the counterpart in _2 file:

TATCCGTTACCTATCGTCCGCGAGAAAGCTAGTAGACACACAGCACCCAGGCGTGCAAGTCACCTTCAGATGACTACACCGAACCTGGTTAAAAGAGTCTATGGCCACCCCTACTTTAGAGTAAAAAAACCACACCTCTATTGCGCTGGGTACTAGAATAAGCTAACTACCTAGTCCGTTTCCGGCTGACTTTTTTGGGAATAACATACCACCCATCGTGATTACGTTCGCCACCGTTCTACTGCTCTCTTCACTAGGTTTGCACATTGTTTGTTCCCCTATGGCTAATTTATAGAGGACN + -6A,A@D<@,CFC,C,E86C+:++8CE,,C,6,;,,CAFGGCE,,C66DE,,C:@C ,9,,,<6,6CE<,,,,<,,C,,<,7++8+B88,AFF@F ,,,,:5?B,,AECFG:+AD,5AF9;,?,4,C,,+++488,=+,=,,,73,+6@6 +3,26=,@,733=6,,==FCG,@,,45<?6,,1,+*_**3,95DD,_3 1/,0+;9+80)0A)/4/))..););6;()/2).))./)0);)1474)4?4)6))))(.640)))1)4)) .,,()),8((,.8((-)).9)-4...((,!

It is the first time I encountered such issue so I don't know if you are aware, cheers Hubert

— Reply to this email directly or view it on GitHub https://github.com/jstjohn/SeqPrep/issues/29#issuecomment-117973836.

Jul 02 '15 13:07 jstjohn

Thanks, I will forward your email to the user who generated the sequences submitted to our pipeline, cheers Hubert

On 02/07/2015 14:29, John St. John wrote:

Looks like your bcl2fastq job is already doing some kind of trimming for you. This is not expected input for seqprep. Maybe if you have say over bcl2fastq parameters you could turn this off? Not sure which settings or defaults would do this in your version. On Thu, Jul 2, 2015 at 2:23 AM hudenise [email protected] wrote:

After investigating the files, the issue was that a few sequences from the _1 file were quite short (<10 nt) while they counterparts from file _2 were significantly longer or also very short. Example from _1 file: @MISEQ .... TGCAGGATATCGCGGCCGT + BCC-@ECFGGD7F7@FE+6 and the counterpart in _2 file:

TATCCGTTACCTATCGTCCGCGAGAAAGCTAGTAGACACACAGCACCCAGGCGTGCAAGTCACCTTCAGATGACTACACCGAACCTGGTTAAAAGAGTCTATGGCCACCCCTACTTTAGAGTAAAAAAACCACACCTCTATTGCGCTGGGTACTAGAATAAGCTAACTACCTAGTCCGTTTCCGGCTGACTTTTTTGGGAATAACATACCACCCATCGTGATTACGTTCGCCACCGTTCTACTGCTCTCTTCACTAGGTTTGCACATTGTTTGTTCCCCTATGGCTAATTTATAGAGGACN + -6A,A@D<@,CFC,C,E86C+:++8CE,,C,6,;,,CAFGGCE,,C66DE,,C:@C ,9,,,<6,6CE<,,,,<,,C,,<,7++8+B88,AFF@F ,,,,:5?B,,AECFG:+AD,5AF9;,?,4,C,,+++488,=+,=,,,73,+6@6 +3,26=,@,733=6,,==FCG,@,,45<?6,,1,+*_**3,95DD,_3 1/,0+;9+80)0A)/4/))..););6;()/2).))./)0);)1474)4?4)6))))(.640)))1)4)) .,,()),8((,.8((-)).9)-4...((,!

It is the first time I encountered such issue so I don't know if you are aware, cheers Hubert

— Reply to this email directly or view it on GitHub https://github.com/jstjohn/SeqPrep/issues/29#issuecomment-117973836.

Reply to this email directly or view it on GitHub: https://github.com/jstjohn/SeqPrep/issues/29#issuecomment-118033674

Dr Hubert DENISE

Metagenomics European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom Tel : (+44)01223 494102

Jul 02 '15 14:07 hudenise

Hello, I am using SeqPrep after Trimmomatic (which trims the reads) and I am experiencing for some files this error: /tmp/sge_spool/lhi10/job_scripts/19611347: line 20: 15553 Segmentation fault (core dumped) SeqPrep -f trimmomatic_input_1P.fastq -r trimmomatic_input_2P.fastq -1 seqprep_1_trimmed.fastq.gz -2 seqprep_2_trimmed.fastq.gz -3 seqprep_1_notmerged.fastq.gz -4 seqprep_2_notmerged.fastq.gz -A AGATCGGAAGAGCACACGTCT -B AGATCGGAAGAGCGTCGTGTA -L 20 -o 40 -s seqprep_merged.fastq.gz -2 file_seqprep.txt.gz 2>> seqprep.log

Above, you mention that the segmentation fault may be explained by the fact that the read pairs in the input file to SeqPrep may not have the same length. Does this mean Trimmomatic should not be used prior to SeqPrep and that the program expects the same length of read pairs to work?

Many thanks for you help on this issue Chloé

Nov 15 '16 09:11 chloeloiseau

Dear Chloe, Indeed we're using SeqPrep upstream of Trimmomatic on the raw reads with just the primer/adapter removed. Then we apply Trimmomatic on the merged file. Sincerely, Hubert

On 15/11/2016 09:57, chloeloiseau wrote:

Hello, I am using SeqPrep after Trimmomatic (which trims the reads) and I am experiencing for some files this error: /tmp/sge_spool/lhi10/job_scripts/19611347: line 20: 15553 Segmentation fault (core dumped) SeqPrep -f trimmomatic_input_1P.fastq -r trimmomatic_input_2P.fastq -1 seqprep_1_trimmed.fastq.gz -2 seqprep_2_trimmed.fastq.gz -3 seqprep_1_notmerged.fastq.gz -4 seqprep_2_notmerged.fastq.gz -A AGATCGGAAGAGCACACGTCT -B AGATCGGAAGAGCGTCGTGTA -L 20 -o 40 -s seqprep_merged.fastq.gz -2 file_seqprep.txt.gz 2>> seqprep.log

Above, you mention that the segmentation fault may be explained by the fact that the read pairs in the input file to SeqPrep may not have the same length. Does this mean Trimmomatic should not be used prior to SeqPrep and that the program expects the same length of read pairs to work?

Many thanks for you help on this issue Chloé

Dr Hubert DENISE

Metagenomics European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom Tel : (+44)01223 494102

Nov 15 '16 10:11 hudenise

SeqPrep SeqPrep copied to clipboard

Segmentation fault while merging PE reads

SeqPrep
SeqPrep copied to clipboard