Fix collectFile cache issue causing software version inconsistencies
Summary
- Adds
cache: falseparameter tocollectFile()call for software versions collection in pipeline template - Prevents inconsistencies in software version reporting when using Nextflow resume function
Problem
When collectFile() uses storeDir with caching enabled, it can lead to missing or additional processes listed in software version reports across multiple pipeline runs with resume. This happens because:
- First run caches software versions in results directory
- Second run with different parameters (e.g.,
--skip_gprofiler) creates new cached versions - Third run with resume may use inconsistent cached data from different runs
Solution
Adding cache: false to the collectFile() call ensures software versions are always collected fresh and consistent with the actual processes that ran.
Test plan
- [x] Verify template syntax is correct
- [x] Pre-commit hooks pass
- [ ] CI tests pass
- [ ] Generated pipelines work correctly with resume functionality
Fixes #3653
🤖 Generated with Claude Code
This PR is against the main branch :x:
- Do not close this PR
- Click Edit and change the
basetodev - This CI test will remain failed until you push a new commit
Hi @ewels,
It looks like this pull-request is has been made against the ewels/nf-core-tools main branch.
The main branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to main are only allowed if they come from the ewels/nf-core-tools dev branch.
You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.
Thanks again for your contribution!
@nf-core-bot changelog
@ewels Thanks for creating the pull request! This fixed the issue for me. However, it will make resuming/caching of downstream processes (multiqc) also impossible, right?
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 76.75%. Comparing base (
03628e8) to head (531be4b).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
However, it will make resuming/caching of downstream processes (multiqc) also impossible, right?
Yes - It will make any consumers of the ch_collated_versions channel, or processes downstream of those have their cache broken every time.
Generally there aren't any processes downstream of MultiQC, it's typically used as a final step that summarises the run, but you're right that it is something that we should be intentional about. I think that we used to have the cache disabled for MultiQC anyway, but I can't find that config now so maybe it was dropped. I'll raise this in Slack on the #tools channel.
@ewels does this also fix https://github.com/nf-core/tools/issues/3110?
Or does it make it worse? 😱
It makes it worse..
Why not make it a native process?
I'm assuming this will be greatly improved by usage of topics and workflow output
Something I just discovered in my own pipeline, but many pipelines make use of .first() to take the first copy of the versions.yml. The versions.yml that gets selected out of say 5 runs of the same process though is not always the same. This means the path to the versions.yml may change, and so the input set will change preventing caching too.
I have found that in my resumed pipeline the software versions yaml just simply isn't being updated at all. I'm not sure if this is related to the bug/behaviour that this PR addresses, but leaving comment here so it's noted somewhere.