demultiplex icon indicating copy to clipboard operation
demultiplex copied to clipboard

Add Kraken2

Open edmundmiller opened this issue 3 years ago • 9 comments

Not sure what was going on with this one before, maybe @csawye01 or @drpatelh can shed some light here.

edmundmiller avatar Apr 26 '22 18:04 edmundmiller

It was to look for contamination in the reads in a quick and unbiased way across organisms.

drpatelh avatar Apr 26 '22 19:04 drpatelh

I'm not really sure this in the scope of a demultiplexing workflow. Sounds to me like that's more like extensive input QC to be done before starting a wf.

matthdsm avatar Oct 07 '22 07:10 matthdsm

I guess the thought process behind this was that the raw reads come out of the demultiplexing workflow so it would be good to add this as an option here as "pre-pipeline" QC (like FastQC) as this sort of information could also be useful to Sequencing facilities that may be interested in the results of the demultiplexing. I suspect it's more effort to add Kraken2 to all individual pipelines but also not fussed if this isn't added here. The biggest complication is sourcing a Kraken2 databases that is comprehensive enough to contain "most" species for contamination detection.

drpatelh avatar Oct 07 '22 09:10 drpatelh

Be awesome to have contamination detection built-in here though.

drpatelh avatar Oct 07 '22 09:10 drpatelh

No problem, we can keep te issue around for the future Will there be room on the refgenie server for a kraken database?

matthdsm avatar Oct 07 '22 10:10 matthdsm

Also, does kraken support remote db's or does it have to be local? e.g., can it read its files straight from S3 (for example) or does it have to stage the entire thing locally?

matthdsm avatar Oct 07 '22 10:10 matthdsm

Hey @matthdsm - I have a subworkflow already fleshed out that does allow even to subsample fastqs using seqtk (unbiased) to speed up the process and then runs kraken2 on it, creating a report that can be fed into MultiQC directly. We use this internally already, will contribute now to subworkflows in modules and then we can simply take it from there 👍🏻

apeltzer avatar Oct 26 '22 09:10 apeltzer

See a PR to add subworkflow for this - we use this already in our own, but I'm attempting to contribute and recycle subworkflows from nf-core to make things easier 👍🏻

https://github.com/nf-core/modules/pull/2397

apeltzer avatar Oct 26 '22 11:10 apeltzer

Hi!

We actually have a use case for this at our platform and would like to include @apeltzer 's subworkflow into nf-core/demultiplex.

Is that ok if we start working on this during the Hackathon?

Aratz avatar Oct 16 '23 09:10 Aratz

This is now implemented via https://github.com/nf-core/demultiplex/pull/220 - will be part of 1.5.0 release

apeltzer avatar Aug 08 '24 21:08 apeltzer