SeqPrep icon indicating copy to clipboard operation
SeqPrep copied to clipboard

Reads with different length

Open fumi-github opened this issue 12 years ago • 1 comments
trafficstars

Hi

I noticed some behavior different from my expectation when read lengths differ.

My forward read has 300bp, and reverse read has 200bp. The reverse read is reverse complementary to position 7 to 206 of the forward read. Position 207 to 300 of the forward read corresponds to the adapter, and is removed as it should be. Position 1 to 6 of the forward read comes from sample DNA (not from adapter), but I noticed that SeqPrep removes it.

I made a small change that prevents the removal, and would like to hear if I am not making a mistake. (My change only applies when -s option is unset.)

Please find below an example data and my changes.

Many thanks for the wonderful program!

With regards, Fumi

EXAMPLE DATA $ cat sample_1.fastq @M00424:21:000000000-A2DRY:1:1101:14712:1407 1:N:0:1 GTTCTTAGATATCTCTCATTTATAAAAGGTTATCTTAATTAAAATGGGTTTACTATATTGGGATAAATGTATAGGATAAGACAAGGACCTTTTTATACGCTCAGACAATAAAATTTTCCAACAATAACTTTCATTTACCCGATAGAGTAAATGATCAAACTGCTAATTTGATGCTTTGTATTTTTAACTATTACGGAGGTGAATCGCTGTCTCTTATACACATCTCTGAGCGGGCTGGCAAGGCAGACCGATCACGATATCGTATGCCGTCTTCTGCTTGGAAACAAAACAATACACCAT + 3>AABFFFFFFFFEGGGGGGGGHHHHGHGHHHHGHHHHHHHHHFHGHEHEGBEF5G5DHEGHAGFFFABFBHG5FHFFBGFA3211CFGGHHHH1DD@1EFDDGFG2FFFBGHHHHHEGE4FFEHFFFFGHGDGHHHHHH?EGCGHHFGGHHGFHHGFFFGFGHHHHHHBGD2GGHGGFBGGHHHGHHHGGHGHGC/D@><FHHB0CGGGF0DFF<GE00<GGH00GF00CD?D?9EGGFFEG?.AEFA?B;;DFD.9.9B.9.;/;B-:;.ABF//:9/.9/B/;E.9..9//9/;/./

$ cat sample_2.fastq @M00424:21:000000000-A2DRY:1:1101:14712:1407 2:N:0:1 CGATTCACCTCCGTAATAGTTAAAAATACAAAGCATCAAATTAGCAGTTTGATCATTTACTCTATCGGGTAAATGAAAGTTATTGTTGGAAAATTTTATTGTCTGAGCGTATAAAAAGGTCCTTGTCTTATCCTATACATTTATCCCAATATAGTAAACCCATTTTAATTACGTTAACCTTTTTTAACTTATCGATATCT + A11>>D3DFFFACG1EF33FG33A1BFBF11A0A1AB111AD11011BFD11BF2DDB2A1BF12B///F/C1AB11E1DEFGF1BBBF1BEFHFHH2DG1B2BB1BEEE/?G2G11E/BBGEH1112B2>F1G2FBFHHHG2G>BB1BF1BF>FF22BF?/CFBB2BBGG1?0@@/@1@GBG1>/11?1111..<.-11

EXAMPLE RUN SeqPrep -f sample_1.fastq -r sample_2.fastq -1 sampleout_1.fastq.gz -2 sampleout_2.fastq.gz -A CTGTCTCTTATACACATCTC -B CTGTCTCTTATACACATCTC

CODE CHANGE FOR utils.c diff utils.c utils.c.original 182,183d181 < sqp->fseq[j1] = c1; //fumi < sqp->fqual[j1] = q1; //fumi 185,186c183,184 < // sqp->fseq[j1] = c1; //fumi moved out from if clause

< // sqp->fqual[j1] = q1; //fumi moved out from if clause

    sqp->fseq[j1] = c1;
    sqp->fqual[j1] = q1;

202c200

< // j1++; //fumi moved out from if clause

    j1++;

204d201 < j1++; //fumi

CODE CHANGE FOR utils.h (to allow long reads) diff utils.h utils.h.original 13c13

< #define MAX_SEQ_LEN (512)

define MAX_SEQ_LEN (256)

fumi-github avatar Mar 12 '13 03:03 fumi-github

Hi Fumi, Thanks for making this change! I am really swamped with work right now so it may be a few weeks before I can review and push your change back to my repository. Glad you are finding it useful! Best, John

On Mar 11, 2013, at 8:35 PM, fumi-github [email protected] wrote:

Hi

I noticed some behavior different from my expectation when read lengths differ.

My forward read has 300bp, and reverse read has 200bp. The reverse read is reverse complementary to position 7 to 206 of the forward read. Position 207 to 300 of the forward read corresponds to the adapter, and is removed as it should be. Position 1 to 6 of the forward read comes from sample DNA (not from adapter), but I noticed that SeqPrep removes it.

I made a small change that prevents the removal, and would like to hear if I am not making a mistake. (My change only applies when -s option is unset.)

Please find below an example data and my changes.

Many thanks for the wonderful program!

With regards, Fumi

EXAMPLE DATA $ cat sample_1.fastq @M00424:21:000000000-A2DRY:1:1101:14712:1407 1:N:0:1 GTTCTTAGATATCTCTCATTTATAAAAGGTTATCTTAATTAAAATGGGTTTACTATATTGGGATAAATGTATAGGATAAGACAAGGACCTTTTTATACGCTCAGACAATAAAATTTTCCAACAATAACTTTCATTTACCCGATAGAGTAAATGATCAAACTGCTAATTTGATGCTTTGTATTTTTAACTATTACGGAGGTGAATCGCTGTCTCTTATACACATCTCTGAGCGGGCTGGCAAGGCAGACCGATCACGATATCGTATGCCGTCTTCTGCTTGGAAACAAAACAATACACCAT + 3>AABFFFFFFFFEGGGGGGGGHHHHGHGHHHHGHHHHHHHHHFHGHEHEGBEF5G5DHEGHAGFFFABFBHG5FHFFBGFA3211CFGGHHHH1DD@1EFDDGFG2FFFBGHHHHHEGE4FFEHFFFFGHGDGHHHHHH?EGCGHHFGGHHGFHHGFFFGFGHHHHHHBGD2GGHGGFBGGHHHGHHHGGHGHGC/D@><FHHB0CGGGF0DFF<GE00<GGH00GF00CD?D?9EGGFFEG?.AEFA?B;;DFD.9.9B.9.;/;B-:;.ABF//:9/.9/B/;E.9..9//9/;/./

$ cat sample_2.fastq @M00424:21:000000000-A2DRY:1:1101:14712:1407 2:N:0:1 CGATTCACCTCCGTAATAGTTAAAAATACAAAGCATCAAATTAGCAGTTTGATCATTTACTCTATCGGGTAAATGAAAGTTATTGTTGGAAAATTTTATTGTCTGAGCGTATAAAAAGGTCCTTGTCTTATCCTATACATTTATCCCAATATAGTAAACCCATTTTAATTACGTTAACCTTTTTTAACTTATCGATATCT + A11>>D3DFFFACG1EF33FG33A1BFBF11A0A1AB111AD11011BFD11BF2DDB2A1BF12B///F/C1AB11E1DEFGF1BBBF1BEFHFHH2DG1B2BB1BEEE/?G2G11E/BBGEH1112B2>F1G2FBFHHHG2G>BB1BF1BF>FF22BF?/CFBB2BBGG1?0@@/@1@GBG1>/11?1111..<.-11

EXAMPLE RUN SeqPrep -f sample_1.fastq -r sample_2.fastq -1 sampleout_1.fastq.gz -2 sampleout_2.fastq.gz -A CTGTCTCTTATACACATCTC -B CTGTCTCTTATACACATCTC

CODE CHANGE FOR utils.c diff utils.c utils.c.original 182,183d181 < sqp->fseq[j1] = c1; //fumi < sqp->fqual[j1] = q1; //fumi 185,186c183,184 < // sqp->fseq[j1] = c1; //fumi moved out from if clause

< // sqp->fqual[j1] = q1; //fumi moved out from if clause

sqp->fseq[j1] = c1;
sqp->fqual[j1] = q1;

202c200

< // j1++; //fumi moved out from if clause

j1++;

204d201 < j1++; //fumi

CODE CHANGE FOR utils.h (to allow long reads) diff utils.h utils.h.original 13c13

< #define MAX_SEQ_LEN (512)

#define MAX_SEQ_LEN (256)

— Reply to this email directly or view it on GitHub.

jstjohn avatar Mar 12 '13 16:03 jstjohn