nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

The nxf_unstage function unnecessarily copies input files that match output glob pattern.

Open robsyme opened this issue 1 year ago • 2 comments

Bug report

Many users will use the scratch true directive, in part to minimize the size of the shared work directory - to ensure that the files saved to the work directory are restricted to only those necessary for downstream tasks and for the resume mechanism.

In cases where a process outputs glob pattern also matches the input file, the input file is unnecessarily copied back into the shared work directory

Steps to reproduce the problem

Given main.nf:

process GreedyOutputGlob {
    scratch true
    input: path(csv)
    output: path("*.csv")
    script: "cp $csv out.csv"
}

workflow {
    Channel.fromPath("data/in.csv")
    | GreedyOutputGlob
    | view
}

Note that the in.csv file is copied back to the shared work directory:

❯ nextflow run .      
N E X T F L O W  ~  version 23.04.1
Launching `./main.nf` [hopeful_church] DSL2 - revision: 06d2458686
executor >  local (1)
[42/2fa08b] process > GreedyOutputGlob (1) [100%] 1 of 1 ✔
/private/tmp/foo/work/42/2fa08b2ef83cd1799c58833592deed/out.csv


/tmp/foo on ☁️  sts on ☁️  [email protected] took 2s 
❯ tree work 
work
└── 42
    └── 2fa08b2ef83cd1799c58833592deed
        ├── in.csv
        └── out.csv

3 directories, 2 files

This is because the nxf_unstage command uses the output glob pattern directly, without regard to the input files:

# ...
for name in $(eval "ls -1d *.csv" | sort | uniq); do
    nxf_fs_copy "$name" /private/tmp/foo/work/42/2fa08b2ef83cd1799c58833592deed || true
done
# ...

Expected behaviour and actual behaviour

To help users save storing the duplicated input files, it would be better if Nextflow excluded input files from being copied back to the shared work directory (unless the includeInputs: true argument is included in the outputs: block).

Environment

  • Nextflow version: 23.04.1
  • Java version: openjdk version "17.0.5" 2022-10-18
  • Operating system: all
  • Bash version: all (Add any other context about the problem here)

robsyme avatar Jun 02 '23 17:06 robsyme