SeqPrep
SeqPrep copied to clipboard
ERROR: Fastq id lines do not match
Hello,
I am running SeqPrep and had this error: Fastq id lines do not match: SRR1640752.230 230 length=91 vs SRR1640752.230 230 length=250
From what I understand the program complains because the number of characters in "SRR1640752.230 230 length=91" is not the same than the number of characters in "SRR1640752.230 230 length=250". This error does not occur in cases where the read length is not the same in the forward and reverse but the number of characters is the same. i.e.: "SRR1640752.230 230 length=249" vs "SRR1640752.230 230 length=250"
Can you help me fix this error ? Many thanks for your help ! Chloé
Oh interesting. Yeah that check is in place to avoid the situation where the reads get out of sync. The length 91 thing seems to be confusing it. You could probably disable that read name length check in the c code and recompile it? Let me know if you have a hard time identifying the line of code to try and comment out, and I can help! On Fri, Nov 3, 2017 at 8:32 AM chloeloiseau [email protected] wrote:
Hello,
I am running SeqPrep and had this error: Fastq id lines do not match: SRR1640752.230 230 length=91 vs SRR1640752.230 230 length=250
From what I understand the program complains because the number of characters in "SRR1640752.230 230 length=91" is not the same than the number of characters in "SRR1640752.230 230 length=250". This error does not occur in cases where the read length is not the same in the forward and reverse but the number of characters is the same. i.e.: "SRR1640752.230 230 length=249" vs "SRR1640752.230 230 length=250"
Can you help me fix this error ? Many thanks for your help ! Chloé
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jstjohn/SeqPrep/issues/41, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcBBs0cbSob6Bqyecf66qH8_9vTbW48ks5syzIrgaJpZM4QRRFe .
Thank you ! Yes I would not mind your help identifying the lines I should disable in the code please.
Should I remove lines 854-865 in utils.c ?
bool f_r_id_check( char fid[], size_t fid_len, char rid[], size_t rid_len ) {
if(fid_len != rid_len){
goto bad_read;
//}else if (strncmp( fid, rid, fid_len - 2) == 0 ) {
}else{
return true;
}
bad_read:
fprintf(stderr,"ERROR: Fastq id lines do not match: %s vs %s \n", fid, rid);
return false;
}
So that’s the function that does the comparison. Looks like your quickest option would be to add a return true; (before that first if) as the first line of the function, and then comment out the remainder of the function up until the closing brace so it’s still valid code (optional but it makes it clear that it is dead code). That’ll pass every read on id checking. On Mon, Nov 6, 2017 at 5:26 AM chloeloiseau [email protected] wrote:
Thank you ! Yes I would not mind your help identifying the lines I should disable in the code please.
Should I remove lines 854-865 in utils.c ?
bool f_r_id_check( char fid[], size_t fid_len, char rid[], size_t rid_len ) { if(fid_len != rid_len){ goto bad_read; //}else if (strncmp( fid, rid, fid_len - 2) == 0 ) { }else{ return true; }
bad_read: fprintf(stderr,"ERROR: Fastq id lines do not match: %s vs %s \n", fid, rid); return false; }
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/jstjohn/SeqPrep/issues/41#issuecomment-342147713, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcBBh8L4SaOHIZwvBdczFMMYDglLGw9ks5szwkJgaJpZM4QRRFe .
Hello,
I have the same problem as the first person. So basically, I have a set (forward and reverse) randomised subset fastq file made using seqtk package (we subset from the original fastq file for an upcoming workshop). Everything was okay so far. Then, I wanted to use SeqPrep to trim and merge. The 'ERROR: Fastq id lines do not match' came up. Below is my command;
seqprep -6
-f ww1e_r1.fq.gz
-r ww1e_r2.fq.gz
-A AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
-B AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
-1 ./out/ww1e_trimmed_r1.fq.gz
-2 ./out/ww1e_trimmed_r2.fq
-s ./out/ww1e_merged_s.fastq.gz
-E ./info/ww1e_alignments_merged.txt.gz
and results;
$ seqprep -6
-f ww1i_r1.fq.gz
-r ww1i_r2.fq.gz
-A AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
-B AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
-1 ./out/ww1i_trimmed_r1.fq.gz
-2 ./out/ww1i_trimmed_r2.fq
-s ./out/ww1i_merged_s.fastq.gz
-E ./info/ww1i_alignments_merged.txt.gz ./out/ww1i_trimmed_r1.fq.gz Cannot open file: No such file or directory ./out/ww1i_trimmed_r2.fq Cannot open file: No such file or directory ./out/ww1i_merged_s.fastq.gz Cannot open file: No such file or directory ./info/ww1i_alignments_merged.txt.gz Cannot open file: No such file or directory ERROR: Fastq id lines do not match: A00582:269:H5KK5DSXY:3:2556:3278:9518 1:N:0:CATACCAA vs A00583:262:H3537DSXY:2:1249:1994:18317 2:N:0:ACCACTGT Pairs Processed: 0 Pairs Merged: 1 Pairs With Adapters: 0 Pairs Discarded: 0 CPU Time Used (Minutes): 0.000048
Any suggestion on how to make SeqPrep ignore such problem? I dont really understand on the previous suggestion to 'return true; at the utils.c' ways.
Thank you!
You wouldn’t want to ignore that… it’s saying that read 1 and read 2 are not actually from the same fragment! I would do some manual qc on the data you are feeding into SeqPrep and make sure that one of your other tools isn’t doing something like discarding one read and not the other, or if somehow files are mismatched or something along those lines. On Jul 21, 2022, 7:32 PM -0700, bahiyahazli @.***>, wrote:
Hello, I have the same problem as the first person. So basically, I have a set (forward and reverse) randomised subset fastq file made using seqtk package (we subset from the original fastq file for an upcoming workshop). Everything was okay so far. Then, I wanted to use SeqPrep to trim and merge. The 'ERROR: Fastq id lines do not match' came up. Below is my command;
seqprep -6 -f ww1e_r1.fq.gz -r ww1e_r2.fq.gz -A AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -B AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -1 ./out/ww1e_trimmed_r1.fq.gz -2 ./out/ww1e_trimmed_r2.fq -s ./out/ww1e_merged_s.fastq.gz -E ./info/ww1e_alignments_merged.txt.gz and results; $ seqprep -6 -f ww1i_r1.fq.gz -r ww1i_r2.fq.gz -A AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -B AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -1 ./out/ww1i_trimmed_r1.fq.gz -2 ./out/ww1i_trimmed_r2.fq -s ./out/ww1i_merged_s.fastq.gz -E ./info/ww1i_alignments_merged.txt.gz ./out/ww1i_trimmed_r1.fq.gz Cannot open file: No such file or directory ./out/ww1i_trimmed_r2.fq Cannot open file: No such file or directory ./out/ww1i_merged_s.fastq.gz Cannot open file: No such file or directory ./info/ww1i_alignments_merged.txt.gz Cannot open file: No such file or directory ERROR: Fastq id lines do not match: A00582:269:H5KK5DSXY:3:2556:3278:9518 1:N:0:CATACCAA vs A00583:262:H3537DSXY:2:1249:1994:18317 2:N:0:ACCACTGT Pairs Processed: 0 Pairs Merged: 1 Pairs With Adapters: 0 Pairs Discarded: 0 CPU Time Used (Minutes): 0.000048 Any suggestion on how to make SeqPrep ignore such problem? I dont really understand on the previous suggestion to 'return true; at the utils.c' ways. Thank you! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Also you’re outputting one file as .fq and the other as .fq.gz. Pretty sure you didn’t mean to do that? On Jul 21, 2022, 7:32 PM -0700, bahiyahazli @.***>, wrote:
Hello, I have the same problem as the first person. So basically, I have a set (forward and reverse) randomised subset fastq file made using seqtk package (we subset from the original fastq file for an upcoming workshop). Everything was okay so far. Then, I wanted to use SeqPrep to trim and merge. The 'ERROR: Fastq id lines do not match' came up. Below is my command;
seqprep -6 -f ww1e_r1.fq.gz -r ww1e_r2.fq.gz -A AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -B AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -1 ./out/ww1e_trimmed_r1.fq.gz -2 ./out/ww1e_trimmed_r2.fq -s ./out/ww1e_merged_s.fastq.gz -E ./info/ww1e_alignments_merged.txt.gz and results; $ seqprep -6 -f ww1i_r1.fq.gz -r ww1i_r2.fq.gz -A AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -B AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -1 ./out/ww1i_trimmed_r1.fq.gz -2 ./out/ww1i_trimmed_r2.fq -s ./out/ww1i_merged_s.fastq.gz -E ./info/ww1i_alignments_merged.txt.gz ./out/ww1i_trimmed_r1.fq.gz Cannot open file: No such file or directory ./out/ww1i_trimmed_r2.fq Cannot open file: No such file or directory ./out/ww1i_merged_s.fastq.gz Cannot open file: No such file or directory ./info/ww1i_alignments_merged.txt.gz Cannot open file: No such file or directory ERROR: Fastq id lines do not match: A00582:269:H5KK5DSXY:3:2556:3278:9518 1:N:0:CATACCAA vs A00583:262:H3537DSXY:2:1249:1994:18317 2:N:0:ACCACTGT Pairs Processed: 0 Pairs Merged: 1 Pairs With Adapters: 0 Pairs Discarded: 0 CPU Time Used (Minutes): 0.000048 Any suggestion on how to make SeqPrep ignore such problem? I dont really understand on the previous suggestion to 'return true; at the utils.c' ways. Thank you! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Makes sense. I'll look onto the data first. We assumed there wont be a problem after we randomly subsample it. The original raw data from our experiment work just fine with SeqPrep.
We probably need to amend on how we subset the fastq. Thanks! Hope everything went smooth after this!