sickle icon indicating copy to clipboard operation
sickle copied to clipboard

differerence in number of kept reads between with and without 5' trimming?

Open biocyberman opened this issue 11 years ago • 2 comments

Hi I tested the program on 3 libraries. The output is counter-intuitive for me: With 5-prime trimming:

FastQ records kept: 48193327 FastQ records discarded: 51125213

FastQ records kept: 92263367 FastQ records discarded: 97344743

FastQ records kept: 146668253 FastQ records discarded: 154683227

No 5-prime trimming: FastQ records kept: 48175971 FastQ records discarded: 51142569

FastQ records kept: 92226296 FastQ records discarded: 97381814

FastQ records kept: 146615764 FastQ records discarded: 154735716

You can see that no 5-prime trimming results in less reads then with 5-prime trimming. I wonder why is that.

biocyberman avatar May 08 '14 08:05 biocyberman

My current finding: if I set -x flag, sickle will drop entirely the reads with first 5' low-qual window. This makes sickle drops more reads with -x flag. This can be a desirable feature or a bug. For colorspace sequence, it is rather a feature, because we don't want to trim 5' ends for sequences with prefix bases.

However, for handling -x flag properly, I think it would be better if sliding_windows works on both directions: from 5' to find 5' cut, and from 3' for 3' cut. This would slow down the program a bit, but it is more robust and perceivable.

biocyberman avatar May 08 '14 10:05 biocyberman

Hello,

Sickle wasn't written with colorspace reads in mind so I can't really say why it would be behaving strange with those reads. If you create a robust solution, maybe we could integrate it into sickle.

  • Nik.

On Thu, May 8, 2014 at 3:23 AM, biocyberman [email protected]:

My current finding: if I set -x flag, sickle will drop entirely the reads with first 5' low-qual window. This makes sickle drops more reads with -x flag. This can be a desirable feature or a bug. For colorspace sequence, it is rather a feature, because we don't want to trim 5' ends for sequences with prefix bases.

— Reply to this email directly or view it on GitHubhttps://github.com/najoshi/sickle/issues/24#issuecomment-42534622 .

Nikhil Joshi Bioinformatics Analyst/Programmer UC Davis Bioinformatics Core http://bioinformatics.ucdavis.edu/ najoshi -at- ucdavis -dot- edu 530.752.2698 (w)

najoshi avatar May 09 '14 00:05 najoshi