nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

"dry run" option?

Open jemunro opened this issue 7 years ago • 15 comments

When using the -resume option, it would be helpful to see which processes are going to be run and which will be reused from the cache. Additionally it would be useful to indicate which hashes have changed requiring a cached process to be run. I know this can be done manually with the -dump-hashes option but this is not very user friendly.

jemunro avatar Aug 23 '18 07:08 jemunro

Yes, it could have sense to have a kind of dry-run showing the which cached tasks would be used.

pditommaso avatar Aug 28 '18 12:08 pditommaso

+1! Would be super useful!

fmorency avatar Jan 07 '19 20:01 fmorency

Just thoughts. Dry-run or plan should be used to produce the full graph structure or just a part of DAG in text or UI mode. It should use fail-fast behavior while constructing AST. The parts of AST can be validated by json schema validator as a workaround.

nextflow plan -target=process.1A_prepare_genome_samtools
nextflow plan -target=module.'rnaseq.nf'.fastqc

blacky0x0 avatar Feb 20 '19 20:02 blacky0x0

was this ever implemented?

stevekm avatar Jan 14 '20 14:01 stevekm

Is dry-run or plan feature available in the latest nexflow version?

kopardev avatar Feb 21 '20 15:02 kopardev

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 27 '20 02:04 stale[bot]

Would still love this

jimhavrilla avatar Jun 25 '20 21:06 jimhavrilla

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 23 '20 01:11 stale[bot]

Nextflow now supports the ability to simulate the execution using command stubs. See #1774.

More here https://www.nextflow.io/docs/edge/process.html#stub

pditommaso avatar Nov 23 '20 06:11 pditommaso

@pditommaso a stub run does not cover the following situation, correct? The situation:

  • The pipeline developer runs their Nextflow pipeline (e.g., large-scale requiring large amounts of compute resources & time)
  • The pipeline dies much of the way through the run (e.g., after days of running) due to an error in one step of the code (e.g., the code didn't scale to 10's of 1000's of files)
  • The pipeline developer modifies the code
  • The pipeline developer wants to know which parts of the pipeline will be re-run as a result of the code modification

I'm in that current situation, and I'm afraid that my modification will restart the pipeline from (near) the beginning, which means a loss of days and $$. I believe that a stub run must start from the beginning, and will thus not show me whether my modifications will result in a (near complete) restart of my pipeline run.

I migrated to Nextflow from Snakemake, and the --dryrun feature of Snakemake has helped me many times to avoid needless computation. It would be great to have the same functionality in Nextflow.

nick-youngblut avatar Apr 11 '23 15:04 nick-youngblut

I want to bump the feature-request for dry-runs as a previous Snakemake user.

In the meantime, if nextflow were to output the reason for reruns that would also be helpful. Snakemake has a reason for every rule.

guarelin avatar Aug 22 '23 17:08 guarelin

I took a first pass at implementing the most recent suggestions in this thread. Namely, resume a pipeline and determine which tasks would be re-executed, without actually executing them.

Feel free to test the linked PR. Basically run a pipeline with -dry-run and it will resume, print every task that is re-executed, then exit.

I think it could be improved by showing also the reason why a task is executed -- no cache entry found, cache folder found but outputs are missing, etc. If someone could share an example Snakemake scheduling plan, it might be useful to compare.

The drawback of this approach is that it doesn't execute the entire pipeline, it only goes as far as the cache can take it. On the other hand, stub run can execute the entire pipeline but does not correspond to a real pipeline run.

bentsherman avatar Aug 24 '23 15:08 bentsherman

See also my comments here: https://github.com/nextflow-io/nextflow/issues/1458#issuecomment-1691952174

bentsherman avatar Aug 24 '23 15:08 bentsherman

I am new in the Nextflow community but I would really love to have this option. I am using it all the time with Snakemake

KristinaGagalova avatar Nov 01 '23 02:11 KristinaGagalova

+1

subwaystation avatar Apr 23 '24 08:04 subwaystation