scythe icon indicating copy to clipboard operation
scythe copied to clipboard

Feature suggestion--parallel processing reads

Open seandavi opened this issue 12 years ago • 5 comments

I have been using scythe in a rather inefficient way by loading up most (all) of the known Illumina adapters and running scythe with that list. We have projects that extend over several years and capturing exactly which adapter sequence has been used is not as simple as it should be. As you can imagine, using 100 or so adapter sequences slows the process somewhat. Parallel processing over reads might improve performance (or might not).

seandavi avatar Aug 31 '12 11:08 seandavi

Sean, If it's not an issue if proprietary adapters, could you email me an example of your adapters? I am planning on adding multireading to scythe, but I think I could leverage some other optimizations if I can characterize redundancy in adapters (which is the case with some, i.e. TruSeq adapters). My email is [email protected] (sans poly-A spam-filter tail).

vsbuffalo avatar Dec 09 '12 23:12 vsbuffalo

On Sun, Dec 9, 2012 at 6:47 PM, Vince Buffalo [email protected]:

Sean, If it's not an issue if proprietary adapters, could you email me an example of your adapters? I am planning on adding multireading to scythe, but I think I could leverage some other optimizations if I can characterize redundancy in adapters (which is the case with some, i.e. TruSeq adapters). My email is [email protected] (sans poly-A spam-filter tail).

I was playing with the contaminants file from fastqc. That seems a pretty complete list, so you could use that.

Thanks, Sean

seandavi avatar Dec 10 '12 00:12 seandavi

In the words of Jay-Z, I've got 64 cores but my read trimmer uses one...

ryneches avatar Mar 17 '14 22:03 ryneches

Nik at the Bioinformatics Core played around with this. I believe his verdict was the bottleneck was predominantly I/O and the overhead of parallelization lead to an overall slower result, but I may be wrong about this. Taking a look at parallelization is on my todo list, but unfortunately there are more pressing tasks. I am happy to incorporate pull requests into the main codebase, as long as single core operation is not adversely affected, and the code is clean.

vsbuffalo avatar Mar 17 '14 22:03 vsbuffalo

Oh. I see why this has been running for so long. The disk is full, but scythe doesn't seem to exit when it runs out of space. :-(

ryneches avatar Mar 17 '14 22:03 ryneches