nextflow
nextflow copied to clipboard
Add a `clean` parameter to `publishDir`
New feature
A clean
parameter added to publishDir
. Example:
publishDir path: 's3://path/to/output`, clean: true
Usage scenario
Often times the output filenames change during development of a pipeline or even due to a data change. It would be good to know that the contents of the publish location is "pristine" and that none of the data/files in that location are from old runs of the pipeline or cruft.
Suggest implementation
Simply adding a clean
parameter (boolean) to publishDir
that is only valid if path
is a directory. Before moving/copying output to that location, all data in that directory is deleted recursively.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Just bumping hoping for at least a comment. 😉
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If I understood your request correctly, this is something you could do with beforeScript
. Check the example below:
ex.nf
params.create_empty_file = ''
process CREATE_EMPTY_FILE {
output:
path 'empty_file.txt'
"""
touch empty_file.txt
"""
}
process FOO {
input:
val x
output:
path 'output.txt'
"""
echo $x > output.txt
"""
}
workflow {
if (params.create_empty_file) {
CREATE_EMPTY_FILE()
}
FOO(params.str)
}
nextflow.config
process {
publishDir = "/Users/mribeirodantas/test"
beforeScript = "cd ${publishDir}; rm -rf *; cd -"
}
If you understood the pipeline above, sometimes (depending on the --create_empty_file
parameter) another file will be created in the publishDir
directory. I did this to make it clear that the publishDir
location will be cleaned before starting (I could also have set overwrite
in publishDir
to false to show you that). You can try this snippet the following way (make sure to change your $HOME
dir, instead of mribeirodantas
:
nextflow run ex.nf --str 'oi' --create_empty_file
ls /Users/mribeirodantas/test/
nextflow run ex.nf --str 'oi'
ls /Users/mribeirodantas/test/
It shouldn't be hard to make this dynamic (only clean with --clean_publishDir
, for example) by moving part of the code from the nextflow.config
file to the pipeline script.
Does this help? 😄
I appreciate the thought, but no, this doesn't work for a couple reasons:
- This doesn't work for the various providers. While testing may be done locally, eventually it will run on AWS or Google, etc. Having it as part of the
publishDir
directive allows for each provider to implement this properly. - The
beforeScript
directive runs - obviously - before the process script. Because of this, whatever the current (or previous) output of the process was, it will be deleted even if the current execution fails, which would be undesirable behavior. The existing output should only be replaced by new output data and only upon success.
Again, thanks for the idea, though.
I agree with the 1st point. The idea was to propose a workaround to help you [with whatever compute environment you're working on] before this feature is fully implemented, tested, and supported by all possible compute environments 😅
As for the 2nd point, depending on your situation, there is indeed a lot more work to do. This was just a simple snippet to mention the feature.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.