rnaseq icon indicating copy to clipboard operation
rnaseq copied to clipboard

Explicit subsampling step in rnaseq pipeline

Open ewallace opened this issue 2 years ago • 2 comments

Description of feature

Subsampling seq data before running a pipeline is good practice to test configurations and fail fast. Allowing the user to subsample the input data before running the entire pipeline, would provide a quicker in-line way to validate that the pipeline runs, troubleshoot, and check inputs.

I would like to request optional subsampling as a feature, I think it will save a lot of people a lot of time. Yes, it's possible for users to manually subsample data and then feed that in to the pipeline, but that seems to be against the nextflow spirit. Having this option inline would let users test-run the pipeline with --subsample-reads 100000 then test everything within minutes, followed by editing that one parameter to run on all the input data.

Probably it's achievable with fq subsample.

Note that the current (v3.12.0) "subsample" step does not do that, see issue #1095.

Issue #1096 suggests a different workaround only if using FastP for alignment.

ewallace avatar Oct 17 '23 14:10 ewallace

Could be solved by #1096

drpatelh avatar May 29 '24 10:05 drpatelh

Potentially useful, but not currently prioritised for development I think so removing from milestone.

pinin4fjords avatar Apr 22 '25 11:04 pinin4fjords