nextflow
nextflow copied to clipboard
[New Feature] Evaluate Closure for every input file
Until now, Nextflow evaluates Closures to stage multiple Inputfiles only once.
Accordingly, it cannot produce individual staging names for different files in one Channel/one task.
However, it might be helpful to evaluate the Closure for every file, as requested here: https://github.com/nextflow-io/nextflow/discussions/1998.
I solve the problem with this PR while not changing the original logic.
If a Closure produces similar names, an increasing counter is added to the similar names.
I also thought about adding this to the current logic: if you stage in as *
and multiple files have the same name. But this would skip collision warnings, which some users may expect and use for debugging.
For example, the following code shows how to keep folder structures for inputs.
fasta = Channel.fromPath( "/root/*/*.fa" ).buffer(size:10, remainder: true)
process blastThemAll {
input:
file {"${sourceObj.parent}/${sourceObj.name}.fa"} from fasta
"""
find . -name "*"
"""
}
For datacube-structured Earth Observation datasets, this PR would be extremely helpful!
:warning: 7 God Classes were detected by Lift in this project. Visit the Lift web console for more details.
Hi @pditommaso, I am reaching out regarding this PR that has been open for over a year without any action but is still of great interest. This PR allows you to dynamically name files if you stage a list of files into a process. This is particularly helpful if you want to create a dynamic folder structure.
To provide you with an example of the necessity of this PR: We are requested to transfer our Rangeland workflow to nf-core. In this workflow, we use FORCE, a tool that organizes files in folder structures, which is not out-of-the-box Nextflow compatible. As a result, we had to manually rename files in some instances, such as in the code snippet provided here.
I would greatly appreciate it if you could take some time to review this PR and provide feedback on any changes that could be made to improve it.
Can you please remind me what you are trying to solve? Nextflow already supports dynamic file name resolution. For example having this
» tree data/
data/
├── one
│ └── file.txt
├── three
│ └── file.txt
└── two
└── file.txt
and using this script
process foo {
debug true
input:
tuple val(name), path("$name/*")
'''
tree .
'''
}
workflow {
channel.fromPath('data/**/*.txt').map { tuple(it.parent.name, it) } | foo
}
It returns
.
└── three
└── file.txt -> /Users/pditommaso/demo/data/three/file.txt
1 directory, 1 file
.
└── two
└── file.txt -> /Users/pditommaso/demo/data/two/file.txt
1 directory, 1 file
.
└── one
└── file.txt -> /Users/pditommaso/demo/data/one/file.txt
Thank you very much for getting back on this.
Sure, I extended the case in your example to also work for more than one file.
Accordingly, you should be able to pass multiple files into a single task with its original data structure.
In the closure path("$name/*")
, the name
is fixed if this task has more than one input file.
Let me extend your input:
tree data/
├── one
│ ├── file1.txt
│ ├── file2.txt
│ └── file3.txt
├── three
│ ├── file1.txt
│ ├── file2.txt
│ └── file3.txt
└── two
├── file1.txt
├── file2.txt
└── file3.txt
Now in your Nextflow script, I group the files by their name. All file1
together, file2
together,...
workflow {
channel.fromPath('/execution/data/**/*.txt').map { tuple(it.name, it) }.groupTuple().map{ it[1] } | foo
}
With the current Nextflow version, I wouldn't be able to get the following:
[74/c871e1] process > foo (2) [100%] 3 of 3 ✔
.
├── one
│ └── file3.txt -> /execution/data/one/file3.txt
├── three
│ └── file3.txt -> /execution/data/three/file3.txt
└── two
└── file3.txt -> /execution/data/two/file3.txt
3 directories, 3 files
.
├── one
│ └── file1.txt -> /execution/data/one/file1.txt
├── three
│ └── file1.txt -> /execution/data/three/file1.txt
└── two
└── file1.txt -> /execution/data/two/file1.txt
3 directories, 3 files
.
├── one
│ └── file2.txt -> /execution/data/one/file2.txt
├── three
│ └── file2.txt -> /execution/data/three/file2.txt
└── two
└── file2.txt -> /execution/data/two/file2.txt
3 directories, 3 files
But this worked with my adjustment and changing the input to:
input:
path ("${sourceObj.parent.name}/*")
This way of data organization is frequently used for data cubes in remote sensing, and thus, supporting this in Nextflow helps using Nextflow for remote sensing workflows with data cubes.
Deploy Preview for nextflow-docs-staging ready!
Name | Link |
---|---|
Latest commit | b526cf9ab849a4fa43b571dde2a6584f0b4801dc |
Latest deploy log | https://app.netlify.com/sites/nextflow-docs-staging/deploys/64bfd3e4add64d0008c58f26 |
Deploy Preview | https://deploy-preview-2622--nextflow-docs-staging.netlify.app/process |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.