nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Resolve relative paths against the work dir

Open drpatelh opened this issue 3 years ago • 6 comments

Using nf-core pipelines as an example, by default we set --outdir ./results which is fine for shared file systems / HPC environments but with the growing adoption of Cloud this doesn't work anymore and can end up costing money and time if an absolute path isn't provided instead of a relative one. For example, on AWS Batch, by default the results will get written to the head node and are lost with it.

More details in https://github.com/nf-core/tools/issues/1415

As you mentioned in https://github.com/nf-core/tools/issues/1415#issuecomment-1046314596 @pditommaso

to resolve relative paths against the work-dir instead of the launching directory

On AWS Batch, by default, isn't the work-dir and launch dir the same i.e. scratch space on the Head node?

I wonder whether this would still be a problem if the intention of the user is to put the results in a different bucket altogether? In which case, making --outdir mandatory would make them think about this beforehand (but this is quite specific to nf-core pipelines).

drpatelh avatar Feb 20 '22 21:02 drpatelh

I believe Tower by default launches from the root directory, then moves the log files into the work directory. So launching from the work directory would be much better also so that the log files aren't lost if the head node is killed suddenly.

But is the idea to resolve against the work directory only in specific cases, like if the executor is awsbatch or the launch directory is / or a nextflow CLI flag is provided? It wouldn't make sense to do this all the time.

bentsherman avatar Feb 21 '22 14:02 bentsherman

But is the idea to resolve against the work directory only in specific cases, like if the executor is awsbatch or the launch directory is / or a nextflow CLI flag is provided? It wouldn't make sense to do this all the time.

Yes, this is why I think making --outdir mandatory will go some way towards making users think about how they should specify their output paths and then they can always use a config option when it is added as a fallback.

drpatelh avatar Feb 21 '22 14:02 drpatelh

This is happening because Nextflow resolves relative paths against the current working directory. When running it in a classic grid HPC, this usually corresponds to a subdirectory of the user home.

When running to Batch or other containerised environments, this is going to be a path into the container file system, with results into the above counterintuitive condition.

Think this could be addressed by introducing an env setting e.g. NXF_RELATIVE_FILE_BASE that specifies the base directory against which relative paths should be resolved.

pditommaso avatar Feb 21 '22 22:02 pditommaso

It would be nice to have a config option for this as well as an env setting. Or would you suggest using something like this instead @pditommaso?

env {
    NXF_RELATIVE_FILE_BASE = "s3://my_bucket/"
}

Another use case that has come up a couple of times recently in the context of AWS Batch (cc @BrunoGrandePhD) is to dynamically output the results relative to workDir. The value of workDir is dynamically created by Tower at run-time so saves having to explicitly name output folders. There are a couple of hacky workarounds but maybe this warrants another NF environment variable NXF_RELATIVE_FILE_WORK_DIR ? Let me know if you want me to push another issue. Peace!

drpatelh avatar Apr 15 '22 20:04 drpatelh

Looking at that

Peace

☮️ ❤️

pditommaso avatar Apr 19 '22 12:04 pditommaso

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 20 '22 20:09 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 18 '23 09:03 stale[bot]

pditommaso avatar Mar 18 '23 11:03 pditommaso

drpatelh avatar Mar 18 '23 12:03 drpatelh

We neeeeeeddddd this!

drpatelh avatar Mar 18 '23 12:03 drpatelh

Solved by #3942

pditommaso avatar Jun 13 '23 12:06 pditommaso