nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Cache invalidated when $launchDir is used (no resume possible)

Open tamasgal opened this issue 2 years ago • 3 comments

Bug report

I am not sure if this is a bug or feature, but it's definitely worth to mention, at least on https://www.nextflow.io/blog/2019/troubleshooting-nextflow-resume.html because I think that this behaviour is a bit unintuitive and took me quite a time to debug (I read through a lot of hash-trace diffs 🙈 )

When using the $launchDir meta variable (or $workflow.launchDir), the cache is somehow affected and not working when using -resume. I don't understand how this is achieved, but obviously nextflow knows that it's a path and treats it as an input. If the modification time of the folder has changed (which happens e.g. if a new file is created inside the folder), the cache is invalidated. Since nextflow writes its log-files by default in $launchDir/.nextflow.log*, no process can be resumed which is somehow using the $launchDir variable.

Expected behavior and actual behavior

Running the workflow multiple times with -resume does not use the cache when $launchDir is appearing in the script section. The $launchDir should not have any impact on the cache ingredients (hashes and timestamps of inputs/outputs).

Steps to reproduce the problem

Run the workflow below with nextflow run workflow.nf -resume multiple times, to see that the cache is not working.

#!/usr/bin/env nextflow
nextflow.enable.dsl = 2

process Process {
  input:
    val(input)
  output:
    file('*.txt')

  script:
      """
      echo $workflow.launchDir
      touch "a.txt"
      """
}

workflow {
  foo = Channel.from(1,2,3)
  Process(foo)
}

Program output

░ tgal@cca008:/sps/km3net/users/tgal/tmp/nextflow-cache
░ 18:38:25 > nextflow run workflow.nf -resume
N E X T F L O W  ~  version 21.10.3
Launching `workflow2.nf` [grave_waddington] - revision: 3ee6c5a013
executor >  local (3)
[49/67ffb1] process > Process (1) [100%] 3 of 3 ✔

░ tgal@cca008:/sps/km3net/users/tgal/tmp/nextflow-cache
░ 18:38:54 > nextflow run workflow.nf -resume
N E X T F L O W  ~  version 21.10.3
Launching `workflow2.nf` [kickass_volhard] - revision: 3ee6c5a013
executor >  local (3)
[f6/5b035c] process > Process (3) [100%] 3 of 3 ✔

Environment

  • Nextflow version: 21.10.3.5655
  • Java version: openjdk version "1.8.0_312", openjdk version "11.0.12" 2021-07-20 LTS
  • Operating system: macOS Big Sur, CentOS 7, ArchLinux (2021.09)
  • Bash version: zsh 5.0.2 (x86_64-redhat-linux-gnu), zsh 5.8 (x86_64-apple-darwin20.0)

tamasgal avatar Dec 01 '21 08:12 tamasgal

I have this same issue.

peddamat avatar Dec 09 '21 02:12 peddamat

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 11 '22 03:06 stale[bot]

This happens because the launchDir is a directory Path. Every time the execution is launched, it will have different content and therefore it will produce a different hashing, causing the cache invalidation.

It you want to prevent that use workflow.launchDir.toString()

pditommaso avatar Sep 21 '22 08:09 pditommaso