SeqPrep
SeqPrep copied to clipboard
Reads with different length
Hi
I noticed some behavior different from my expectation when read lengths differ.
My forward read has 300bp, and reverse read has 200bp. The reverse read is reverse complementary to position 7 to 206 of the forward read. Position 207 to 300 of the forward read corresponds to the adapter, and is removed as it should be. Position 1 to 6 of the forward read comes from sample DNA (not from adapter), but I noticed that SeqPrep removes it.
I made a small change that prevents the removal, and would like to hear if I am not making a mistake. (My change only applies when -s option is unset.)
Please find below an example data and my changes.
Many thanks for the wonderful program!
With regards, Fumi
EXAMPLE DATA $ cat sample_1.fastq @M00424:21:000000000-A2DRY:1:1101:14712:1407 1:N:0:1 GTTCTTAGATATCTCTCATTTATAAAAGGTTATCTTAATTAAAATGGGTTTACTATATTGGGATAAATGTATAGGATAAGACAAGGACCTTTTTATACGCTCAGACAATAAAATTTTCCAACAATAACTTTCATTTACCCGATAGAGTAAATGATCAAACTGCTAATTTGATGCTTTGTATTTTTAACTATTACGGAGGTGAATCGCTGTCTCTTATACACATCTCTGAGCGGGCTGGCAAGGCAGACCGATCACGATATCGTATGCCGTCTTCTGCTTGGAAACAAAACAATACACCAT + 3>AABFFFFFFFFEGGGGGGGGHHHHGHGHHHHGHHHHHHHHHFHGHEHEGBEF5G5DHEGHAGFFFABFBHG5FHFFBGFA3211CFGGHHHH1DD@1EFDDGFG2FFFBGHHHHHEGE4FFEHFFFFGHGDGHHHHHH?EGCGHHFGGHHGFHHGFFFGFGHHHHHHBGD2GGHGGFBGGHHHGHHHGGHGHGC/D@><FHHB0CGGGF0DFF<GE00<GGH00GF00CD?D?9EGGFFEG?.AEFA?B;;DFD.9.9B.9.;/;B-:;.ABF//:9/.9/B/;E.9..9//9/;/./
$ cat sample_2.fastq @M00424:21:000000000-A2DRY:1:1101:14712:1407 2:N:0:1 CGATTCACCTCCGTAATAGTTAAAAATACAAAGCATCAAATTAGCAGTTTGATCATTTACTCTATCGGGTAAATGAAAGTTATTGTTGGAAAATTTTATTGTCTGAGCGTATAAAAAGGTCCTTGTCTTATCCTATACATTTATCCCAATATAGTAAACCCATTTTAATTACGTTAACCTTTTTTAACTTATCGATATCT + A11>>D3DFFFACG1EF33FG33A1BFBF11A0A1AB111AD11011BFD11BF2DDB2A1BF12B///F/C1AB11E1DEFGF1BBBF1BEFHFHH2DG1B2BB1BEEE/?G2G11E/BBGEH1112B2>F1G2FBFHHHG2G>BB1BF1BF>FF22BF?/CFBB2BBGG1?0@@/@1@GBG1>/11?1111..<.-11
EXAMPLE RUN SeqPrep -f sample_1.fastq -r sample_2.fastq -1 sampleout_1.fastq.gz -2 sampleout_2.fastq.gz -A CTGTCTCTTATACACATCTC -B CTGTCTCTTATACACATCTC
CODE CHANGE FOR utils.c diff utils.c utils.c.original 182,183d181 < sqp->fseq[j1] = c1; //fumi < sqp->fqual[j1] = q1; //fumi 185,186c183,184 < // sqp->fseq[j1] = c1; //fumi moved out from if clause
< // sqp->fqual[j1] = q1; //fumi moved out from if clause
sqp->fseq[j1] = c1; sqp->fqual[j1] = q1;202c200
< // j1++; //fumi moved out from if clause
j1++;204d201 < j1++; //fumi
CODE CHANGE FOR utils.h (to allow long reads) diff utils.h utils.h.original 13c13
< #define MAX_SEQ_LEN (512)
define MAX_SEQ_LEN (256)
Hi Fumi, Thanks for making this change! I am really swamped with work right now so it may be a few weeks before I can review and push your change back to my repository. Glad you are finding it useful! Best, John
On Mar 11, 2013, at 8:35 PM, fumi-github [email protected] wrote:
Hi
I noticed some behavior different from my expectation when read lengths differ.
My forward read has 300bp, and reverse read has 200bp. The reverse read is reverse complementary to position 7 to 206 of the forward read. Position 207 to 300 of the forward read corresponds to the adapter, and is removed as it should be. Position 1 to 6 of the forward read comes from sample DNA (not from adapter), but I noticed that SeqPrep removes it.
I made a small change that prevents the removal, and would like to hear if I am not making a mistake. (My change only applies when -s option is unset.)
Please find below an example data and my changes.
Many thanks for the wonderful program!
With regards, Fumi
EXAMPLE DATA $ cat sample_1.fastq @M00424:21:000000000-A2DRY:1:1101:14712:1407 1:N:0:1 GTTCTTAGATATCTCTCATTTATAAAAGGTTATCTTAATTAAAATGGGTTTACTATATTGGGATAAATGTATAGGATAAGACAAGGACCTTTTTATACGCTCAGACAATAAAATTTTCCAACAATAACTTTCATTTACCCGATAGAGTAAATGATCAAACTGCTAATTTGATGCTTTGTATTTTTAACTATTACGGAGGTGAATCGCTGTCTCTTATACACATCTCTGAGCGGGCTGGCAAGGCAGACCGATCACGATATCGTATGCCGTCTTCTGCTTGGAAACAAAACAATACACCAT + 3>AABFFFFFFFFEGGGGGGGGHHHHGHGHHHHGHHHHHHHHHFHGHEHEGBEF5G5DHEGHAGFFFABFBHG5FHFFBGFA3211CFGGHHHH1DD@1EFDDGFG2FFFBGHHHHHEGE4FFEHFFFFGHGDGHHHHHH?EGCGHHFGGHHGFHHGFFFGFGHHHHHHBGD2GGHGGFBGGHHHGHHHGGHGHGC/D@><FHHB0CGGGF0DFF<GE00<GGH00GF00CD?D?9EGGFFEG?.AEFA?B;;DFD.9.9B.9.;/;B-:;.ABF//:9/.9/B/;E.9..9//9/;/./
$ cat sample_2.fastq @M00424:21:000000000-A2DRY:1:1101:14712:1407 2:N:0:1 CGATTCACCTCCGTAATAGTTAAAAATACAAAGCATCAAATTAGCAGTTTGATCATTTACTCTATCGGGTAAATGAAAGTTATTGTTGGAAAATTTTATTGTCTGAGCGTATAAAAAGGTCCTTGTCTTATCCTATACATTTATCCCAATATAGTAAACCCATTTTAATTACGTTAACCTTTTTTAACTTATCGATATCT + A11>>D3DFFFACG1EF33FG33A1BFBF11A0A1AB111AD11011BFD11BF2DDB2A1BF12B///F/C1AB11E1DEFGF1BBBF1BEFHFHH2DG1B2BB1BEEE/?G2G11E/BBGEH1112B2>F1G2FBFHHHG2G>BB1BF1BF>FF22BF?/CFBB2BBGG1?0@@/@1@GBG1>/11?1111..<.-11
EXAMPLE RUN SeqPrep -f sample_1.fastq -r sample_2.fastq -1 sampleout_1.fastq.gz -2 sampleout_2.fastq.gz -A CTGTCTCTTATACACATCTC -B CTGTCTCTTATACACATCTC
CODE CHANGE FOR utils.c diff utils.c utils.c.original 182,183d181 < sqp->fseq[j1] = c1; //fumi < sqp->fqual[j1] = q1; //fumi 185,186c183,184 < // sqp->fseq[j1] = c1; //fumi moved out from if clause
< // sqp->fqual[j1] = q1; //fumi moved out from if clause
sqp->fseq[j1] = c1; sqp->fqual[j1] = q1;202c200
< // j1++; //fumi moved out from if clause
j1++;204d201 < j1++; //fumi
CODE CHANGE FOR utils.h (to allow long reads) diff utils.h utils.h.original 13c13
< #define MAX_SEQ_LEN (512)
#define MAX_SEQ_LEN (256)
— Reply to this email directly or view it on GitHub.