Resolve relative paths against the work dir
Using nf-core pipelines as an example, by default we set --outdir ./results which is fine for shared file systems / HPC environments but with the growing adoption of Cloud this doesn't work anymore and can end up costing money and time if an absolute path isn't provided instead of a relative one. For example, on AWS Batch, by default the results will get written to the head node and are lost with it.
More details in https://github.com/nf-core/tools/issues/1415
As you mentioned in https://github.com/nf-core/tools/issues/1415#issuecomment-1046314596 @pditommaso
to resolve relative paths against the work-dir instead of the launching directory
On AWS Batch, by default, isn't the work-dir and launch dir the same i.e. scratch space on the Head node?
I wonder whether this would still be a problem if the intention of the user is to put the results in a different bucket altogether? In which case, making --outdir mandatory would make them think about this beforehand (but this is quite specific to nf-core pipelines).
I believe Tower by default launches from the root directory, then moves the log files into the work directory. So launching from the work directory would be much better also so that the log files aren't lost if the head node is killed suddenly.
But is the idea to resolve against the work directory only in specific cases, like if the executor is awsbatch or the launch directory is / or a nextflow CLI flag is provided? It wouldn't make sense to do this all the time.
But is the idea to resolve against the work directory only in specific cases, like if the executor is awsbatch or the launch directory is / or a nextflow CLI flag is provided? It wouldn't make sense to do this all the time.
Yes, this is why I think making --outdir mandatory will go some way towards making users think about how they should specify their output paths and then they can always use a config option when it is added as a fallback.
This is happening because Nextflow resolves relative paths against the current working directory. When running it in a classic grid HPC, this usually corresponds to a subdirectory of the user home.
When running to Batch or other containerised environments, this is going to be a path into the container file system, with results into the above counterintuitive condition.
Think this could be addressed by introducing an env setting e.g. NXF_RELATIVE_FILE_BASE that specifies the base directory against which relative paths should be resolved.
It would be nice to have a config option for this as well as an env setting. Or would you suggest using something like this instead @pditommaso?
env {
NXF_RELATIVE_FILE_BASE = "s3://my_bucket/"
}
Another use case that has come up a couple of times recently in the context of AWS Batch (cc @BrunoGrandePhD) is to dynamically output the results relative to workDir. The value of workDir is dynamically created by Tower at run-time so saves having to explicitly name output folders. There are a couple of hacky workarounds but maybe this warrants another NF environment variable NXF_RELATIVE_FILE_WORK_DIR ? Let me know if you want me to push another issue. Peace!
Looking at that
Peace
☮️ ❤️
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

We neeeeeeddddd this!
Solved by #3942