nextflow
nextflow copied to clipboard
Code executed using 'exec' is executed outside of work directory
Bug report
Expected behavior and actual behavior
The documentation says that I can just replace a script command with Groovy language code to do some processing. However, when I do that, the code executes in the directory where the pipeline was executed instead of in the work directory of the process. I couldn't find a good way to make it execute in the directory of the process.
Expected Behavior: code executes in work directory the same as a script would do Actual Behaviour: code executes in directory from which nextflow was launched
Steps to reproduce the problem
nextflow.enable.dsl = 2
process hello {
echo true
input:
val world
output:
path 'test.txt'
exec:
file('test.txt').text = "hello $world"
}
workflow {
Channel.fromList(['mars','jupiter']) | hello
}
Program output
N E X T F L O W ~ version 21.10.6
Launching `test.groovy` [hopeful_volhard] - revision: 3d50bfdfcf
executor > local (2)
executor > local (2)
[0b/fc4d48] process > hello (1) [100%] 1 of 1, failed: 1
Error executing process > 'hello (2)'
Caused by:
Missing output file(s) `test.txt` expected by process `hello (2)`
Source block:
file('test.txt').text = "hello $world"
Work dir:
...
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
Apart from executing in the wrong directory, it also has the undesirable consequence that the stages overwrite each other's output files (so at the end there is a single test.txt
in the same directory where I launched the pipeline, containing a semi-random output).
Additional context
I guess this could be tricky to fix if Nextflow is directly executing the code in the same JVM as the nextflow manager script, since you obviously cannot have multiple processes all changing the process cwd at the same time. Perhaps the default behaviour of file
could be modified so that it returns results relative to the work directory? It won't fix everything, but a subset of use cases will work.
If not, at least some warning in the documentation could be added.
I was curious if there is a way to get the path to the actual working directory for a process so that I could set it manually.
This is caused by the fact the relative path is always resolved by the Jvm against the main current launching directory.
Therefore the task work directory should be taken using the attribute task.workDir
e.g.
task.workDir.resolve('test.txt').text = "hello $world"
Thinking more we should look if it's possible to hijack the file invocation within the process context and resolve the relative path against task.workDir
. tagging @jorgeaguileraseqera
I ran into the same issue when I wanted to use some groovy to munge some collected data from a channel.
Channel
.fromList(['a','ba','cab','done','elbow','fibers','ghastly', ''])
.into {ch1}
process exec_to_file {
publishDir "report"
input:
val consolidated from ch1.collect()
output:
path 'exec_ex.txt'
exec:
new File('./exec_ex.txt').withWriter { writer ->
consolidated.target.each { val ->
writer.writeLine val
}
}
}
Following @pditommaso's work around was successful:
exec:
outfile = task.workDir.resolve('exec_ex.txt')
outfile.withWriter { writer ->
consolidated.target.each { val ->
writer.writeLine val
}
}
As a PoC I've created this branch
https://github.com/nextflow-io/nextflow/tree/2628-code-executed-using-exec-is-executed-outside-of-work-directory
the idea is to inject the workDir into a ThreadLocal and use it in nextflow functions as file
(as implemented in this branch) path
etc
with this approach, dsl methods can work in the working directory of the process out of the box and custom scripts can use the suggested approach if they want
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.