nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Add a `clean` parameter to `publishDir`

Open massung opened this issue 2 years ago • 3 comments

New feature

A clean parameter added to publishDir. Example:

publishDir path: 's3://path/to/output`, clean: true

Usage scenario

Often times the output filenames change during development of a pipeline or even due to a data change. It would be good to know that the contents of the publish location is "pristine" and that none of the data/files in that location are from old runs of the pipeline or cruft.

Suggest implementation

Simply adding a clean parameter (boolean) to publishDir that is only valid if path is a directory. Before moving/copying output to that location, all data in that directory is deleted recursively.

massung avatar Sep 24 '21 14:09 massung

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 22 '22 07:02 stale[bot]

Just bumping hoping for at least a comment. 😉

massung avatar Feb 22 '22 13:02 massung

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 31 '22 17:07 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 31 '22 21:12 stale[bot]

If I understood your request correctly, this is something you could do with beforeScript. Check the example below: ex.nf

params.create_empty_file = ''

process CREATE_EMPTY_FILE {
  output:
    path 'empty_file.txt'
  """
  touch empty_file.txt
  """
}

process FOO {
  input:
    val x
  output:
    path 'output.txt'
  """
  echo $x > output.txt
  """
}

workflow {
  if (params.create_empty_file) {
    CREATE_EMPTY_FILE()
  }
  FOO(params.str)
}

nextflow.config

process {
  publishDir = "/Users/mribeirodantas/test"
  beforeScript = "cd ${publishDir}; rm -rf *; cd -"
}

If you understood the pipeline above, sometimes (depending on the --create_empty_file parameter) another file will be created in the publishDir directory. I did this to make it clear that the publishDir location will be cleaned before starting (I could also have set overwrite in publishDir to false to show you that). You can try this snippet the following way (make sure to change your $HOME dir, instead of mribeirodantas:

nextflow run ex.nf --str 'oi' --create_empty_file
ls /Users/mribeirodantas/test/
nextflow run ex.nf --str 'oi'
ls /Users/mribeirodantas/test/

It shouldn't be hard to make this dynamic (only clean with --clean_publishDir, for example) by moving part of the code from the nextflow.config file to the pipeline script.

Does this help? 😄

mribeirodantas avatar Jan 01 '23 16:01 mribeirodantas

I appreciate the thought, but no, this doesn't work for a couple reasons:

  1. This doesn't work for the various providers. While testing may be done locally, eventually it will run on AWS or Google, etc. Having it as part of the publishDir directive allows for each provider to implement this properly.
  2. The beforeScript directive runs - obviously - before the process script. Because of this, whatever the current (or previous) output of the process was, it will be deleted even if the current execution fails, which would be undesirable behavior. The existing output should only be replaced by new output data and only upon success.

Again, thanks for the idea, though.

massung avatar Jan 01 '23 17:01 massung

I agree with the 1st point. The idea was to propose a workaround to help you [with whatever compute environment you're working on] before this feature is fully implemented, tested, and supported by all possible compute environments 😅

As for the 2nd point, depending on your situation, there is indeed a lot more work to do. This was just a simple snippet to mention the feature.

mribeirodantas avatar Jan 01 '23 17:01 mribeirodantas

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 10 '23 03:06 stale[bot]