rnaseq
rnaseq copied to clipboard
Use eval output for tool versions
This PR uses the experimental cmd output type in https://github.com/nextflow-io/nextflow/pull/4493 to simplify the collection of tool versions.
Once the topic channel support is merged into Nextflow, we can merge this PR with #1109 to simplify things further. Instead of emitting versions1, versions2, etc for processes with multiple tools, we can simply send them all to the 'versions' topic.
PR checklist
- [ ] This comment contains a description of changes (with reason).
- [ ] Make sure your code lints (
nf-core lint). - [ ] Ensure the test suite passes (
nextflow run . -profile test,docker --outdir <OUTDIR>). - [ ] Usage Documentation in
docs/usage.mdis updated. - [ ] Output Documentation in
docs/output.mdis updated. - [ ]
CHANGELOG.mdis updated. - [ ]
README.mdis updated (including new tool citations and authors/contributors).
For some reason the SALMON_QUANT process is failing:
Caused by:
Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:SALMON_QUANT (RAP1_UNINDUCED_REP2)` terminated with an error exit status (1)
Command executed:
salmon quant \
--geneMap genome_gfp.gtf \
--threads 2 \
--libType=SR \
--index salmon \
-r RAP1_UNINDUCED_REP2_primary.fastq.gz \
\
-o RAP1_UNINDUCED_REP2
if [ -f RAP1_UNINDUCED_REP2/aux_info/meta_info.json ]; then
cp RAP1_UNINDUCED_REP2/aux_info/meta_info.json "RAP1_UNINDUCED_REP2_meta_info.json"
fi
Command exit status:
1
Command output:
(empty)
Command error:
Version Info: This is the most recent version of salmon.
### salmon (selective-alignment-based) v1.10.1
### [ program ] => salmon
### [ command ] => quant
### [ geneMap ] => { genome_gfp.gtf }
### [ threads ] => { 2 }
### [ libType ] => { SR }
### [ index ] => { salmon }
### [ unmatedReads ] => { RAP1_UNINDUCED_REP2_primary.fastq.gz }
### [ output ] => { RAP1_UNINDUCED_REP2 }
Logs will be written to RAP1_UNINDUCED_REP2/logs
[2023-11-15 06:41:55.947] [jointLog] [info] setting maxHashResizeThreads to 2
-----------------------------------------
| Loading contig table | Time = 63.579 us
-----------------------------------------
size = 840
[2023-11-15 06:41:55.947] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.
[2023-11-15 06:41:55.947] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2023-11-15 06:41:55.947] [jointLog] [info] Setting consensusSlack to selective-alignment default of 0.35.
[2023-11-15 06:41:55.947] [jointLog] [info] parsing read library format
[2023-11-15 06:41:55.947] [jointLog] [info] There is 1 library.
[2023-11-15 06:41:55.947] [jointLog] [info] Loading pufferfish index
[2023-11-15 06:41:55.947] [jointLog] [info] Loading dense pufferfish index.
-----------------------------------------
| Loading contig offsets | Time = 1.6262 ms
-----------------------------------------
-----------------------------------------
| Loading reference lengths | Time = 3.809 us
-----------------------------------------
-----------------------------------------
| Loading mphf table | Time = 77.721 us
-----------------------------------------
size = 247935
Number of ones: 839
Number of ones per inventory item: 512
Inventory entries filled: 2
-----------------------------------------
| Loading contig boundaries | Time = 478.93 us
-----------------------------------------
size = 247935
-----------------------------------------
| Loading sequence | Time = 95.192 us
-----------------------------------------
size = 222765
-----------------------------------------
| Loading positions | Time = 346.52 us
-----------------------------------------
size = 381211
-----------------------------------------
| Loading reference sequence | Time = 86.075 us
-----------------------------------------
-----------------------------------------
| Loading reference accumulative lengths | Time = 2.828 us
-----------------------------------------
[2023-11-15 06:41:55.950] [jointLog] [info] done
[2023-11-15 06:41:56.000] [jointLog] [info] Index contained 126 targets
[2023-11-15 06:41:56.000] [jointLog] [info] Number of decoys : 1
[2023-11-15 06:41:56.000] [jointLog] [info] First decoy index : 125
Error: no valid ID found for GFF record
Maybe dev is broken?
Fetching upstream fixed it 👍
nf-core lint overall result: Passed :white_check_mark: :warning:
Posted for pipeline commit e68f451
+| ✅ 144 tests passed |+
#| ❔ 6 tests were ignored |#
!| ❗ 5 tests had warnings |!
:heavy_exclamation_mark: Test warnings:
- files_exist - File not found:
.github/workflows/awstest.yml - files_exist - File not found:
.github/workflows/awsfulltest.yml - nextflow_config - Config
manifest.versionshould end indev:3.13.0 - pipeline_todos - TODO string in
methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline - pipeline_todos - TODO string in
WorkflowRnaseq.groovy: Optionally add in-text citation tools to this list.
:grey_question: Tests ignored:
- files_unchanged - File ignored due to lint config:
assets/email_template.html - files_unchanged - File ignored due to lint config:
assets/email_template.txt - files_unchanged - File ignored due to lint config:
lib/NfcoreTemplate.groovy - files_unchanged - File ignored due to lint config:
.gitignoreor.prettierignoreorpyproject.toml - actions_awstest - 'awstest.yml' workflow not found:
/home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml - multiqc_config - multiqc_config
:white_check_mark: Tests passed:
- files_exist - File found:
.gitattributes - files_exist - File found:
.gitignore - files_exist - File found:
.nf-core.yml - files_exist - File found:
.editorconfig - files_exist - File found:
.prettierignore - files_exist - File found:
.prettierrc.yml - files_exist - File found:
CHANGELOG.md - files_exist - File found:
CITATIONS.md - files_exist - File found:
CODE_OF_CONDUCT.md - files_exist - File found:
CODE_OF_CONDUCT.md - files_exist - File found:
LICENSEorLICENSE.mdorLICENCEorLICENCE.md - files_exist - File found:
nextflow_schema.json - files_exist - File found:
nextflow.config - files_exist - File found:
README.md - files_exist - File found:
.github/.dockstore.yml - files_exist - File found:
.github/CONTRIBUTING.md - files_exist - File found:
.github/ISSUE_TEMPLATE/bug_report.yml - files_exist - File found:
.github/ISSUE_TEMPLATE/config.yml - files_exist - File found:
.github/ISSUE_TEMPLATE/feature_request.yml - files_exist - File found:
.github/PULL_REQUEST_TEMPLATE.md - files_exist - File found:
.github/workflows/branch.yml - files_exist - File found:
.github/workflows/ci.yml - files_exist - File found:
.github/workflows/linting_comment.yml - files_exist - File found:
.github/workflows/linting.yml - files_exist - File found:
assets/email_template.html - files_exist - File found:
assets/email_template.txt - files_exist - File found:
assets/sendmail_template.txt - files_exist - File found:
assets/nf-core-rnaseq_logo_light.png - files_exist - File found:
conf/modules.config - files_exist - File found:
conf/test.config - files_exist - File found:
conf/test_full.config - files_exist - File found:
docs/images/nf-core-rnaseq_logo_light.png - files_exist - File found:
docs/images/nf-core-rnaseq_logo_dark.png - files_exist - File found:
docs/output.md - files_exist - File found:
docs/README.md - files_exist - File found:
docs/README.md - files_exist - File found:
docs/usage.md - files_exist - File found:
lib/nfcore_external_java_deps.jar - files_exist - File found:
lib/NfcoreTemplate.groovy - files_exist - File found:
lib/Utils.groovy - files_exist - File found:
lib/WorkflowMain.groovy - files_exist - File found:
main.nf - files_exist - File found:
assets/multiqc_config.yml - files_exist - File found:
conf/base.config - files_exist - File found:
conf/igenomes.config - files_exist - File found:
lib/WorkflowRnaseq.groovy - files_exist - File found:
modules.json - files_exist - File found:
pyproject.toml - files_exist - File not found check:
Singularity - files_exist - File not found check:
parameters.settings.json - files_exist - File not found check:
pipeline_template.yml - files_exist - File not found check:
.nf-core.yaml - files_exist - File not found check:
bin/markdown_to_html.r - files_exist - File not found check:
conf/aws.config - files_exist - File not found check:
.github/workflows/push_dockerhub.yml - files_exist - File not found check:
.github/ISSUE_TEMPLATE/bug_report.md - files_exist - File not found check:
.github/ISSUE_TEMPLATE/feature_request.md - files_exist - File not found check:
docs/images/nf-core-rnaseq_logo.png - files_exist - File not found check:
.markdownlint.yml - files_exist - File not found check:
.yamllint.yml - files_exist - File not found check:
lib/Checks.groovy - files_exist - File not found check:
lib/Completion.groovy - files_exist - File not found check:
lib/Workflow.groovy - files_exist - File not found check:
.travis.yml - nextflow_config - Config variable found:
manifest.name - nextflow_config - Config variable found:
manifest.nextflowVersion - nextflow_config - Config variable found:
manifest.description - nextflow_config - Config variable found:
manifest.version - nextflow_config - Config variable found:
manifest.homePage - nextflow_config - Config variable found:
timeline.enabled - nextflow_config - Config variable found:
trace.enabled - nextflow_config - Config variable found:
report.enabled - nextflow_config - Config variable found:
dag.enabled - nextflow_config - Config variable found:
process.cpus - nextflow_config - Config variable found:
process.memory - nextflow_config - Config variable found:
process.time - nextflow_config - Config variable found:
params.outdir - nextflow_config - Config variable found:
params.input - nextflow_config - Config variable found:
params.validationShowHiddenParams - nextflow_config - Config variable found:
params.validationSchemaIgnoreParams - nextflow_config - Config variable found:
manifest.mainScript - nextflow_config - Config variable found:
timeline.file - nextflow_config - Config variable found:
trace.file - nextflow_config - Config variable found:
report.file - nextflow_config - Config variable found:
dag.file - nextflow_config - Config variable (correctly) not found:
params.nf_required_version - nextflow_config - Config variable (correctly) not found:
params.container - nextflow_config - Config variable (correctly) not found:
params.singleEnd - nextflow_config - Config variable (correctly) not found:
params.igenomesIgnore - nextflow_config - Config variable (correctly) not found:
params.name - nextflow_config - Config variable (correctly) not found:
params.enable_conda - nextflow_config - Config
timeline.enabledhad correct value:true - nextflow_config - Config
report.enabledhad correct value:true - nextflow_config - Config
trace.enabledhad correct value:true - nextflow_config - Config
dag.enabledhad correct value:true - nextflow_config - Config
manifest.namebegan withnf-core/ - nextflow_config - Config variable
manifest.homePagebegan with https://github.com/nf-core/ - nextflow_config - Config
dag.fileended with.html - nextflow_config - Config variable
manifest.nextflowVersionstarted with >= or !>= - nextflow_config - Config
params.custom_config_versionis set tomaster - nextflow_config - Config
params.custom_config_baseis set tohttps://raw.githubusercontent.com/nf-core/configs/master - nextflow_config - Lines for loading custom profiles found
- files_unchanged -
.gitattributesmatches the template - files_unchanged -
.prettierrc.ymlmatches the template - files_unchanged -
CODE_OF_CONDUCT.mdmatches the template - files_unchanged -
LICENSEmatches the template - files_unchanged -
.github/.dockstore.ymlmatches the template - files_unchanged -
.github/CONTRIBUTING.mdmatches the template - files_unchanged -
.github/ISSUE_TEMPLATE/bug_report.ymlmatches the template - files_unchanged -
.github/ISSUE_TEMPLATE/config.ymlmatches the template - files_unchanged -
.github/ISSUE_TEMPLATE/feature_request.ymlmatches the template - files_unchanged -
.github/PULL_REQUEST_TEMPLATE.mdmatches the template - files_unchanged -
.github/workflows/branch.ymlmatches the template - files_unchanged -
.github/workflows/linting_comment.ymlmatches the template - files_unchanged -
.github/workflows/linting.ymlmatches the template - files_unchanged -
assets/sendmail_template.txtmatches the template - files_unchanged -
assets/nf-core-rnaseq_logo_light.pngmatches the template - files_unchanged -
docs/images/nf-core-rnaseq_logo_light.pngmatches the template - files_unchanged -
docs/images/nf-core-rnaseq_logo_dark.pngmatches the template - files_unchanged -
docs/README.mdmatches the template - files_unchanged -
lib/nfcore_external_java_deps.jarmatches the template - actions_ci - '.github/workflows/ci.yml' is triggered on expected events
- actions_ci - '.github/workflows/ci.yml' checks minimum NF version
- readme - README Nextflow minimum version badge matched config. Badge:
23.04.0, Config:23.04.0 - readme - README Zenodo placeholder was replaced with DOI.
- pipeline_name_conventions - Name adheres to nf-core convention
- template_strings - Did not find any Jinja template strings (257 files)
- schema_lint - Schema lint passed
- schema_lint - Schema title + description lint passed
- schema_lint - Input mimetype lint passed: 'text/csv'
- schema_params - Schema matched params returned from nextflow config
- system_exit - No
System.exitcalls found - actions_schema_validation - Workflow validation passed: branch.yml
- actions_schema_validation - Workflow validation passed: release-announcments.yml
- actions_schema_validation - Workflow validation passed: linting_comment.yml
- actions_schema_validation - Workflow validation passed: cloud_tests_small.yml
- actions_schema_validation - Workflow validation passed: ci.yml
- actions_schema_validation - Workflow validation passed: clean-up.yml
- actions_schema_validation - Workflow validation passed: linting.yml
- actions_schema_validation - Workflow validation passed: fix-linting.yml
- actions_schema_validation - Workflow validation passed: cloud_tests_full.yml
- merge_markers - No merge markers found in pipeline files
- modules_json - Only installed modules found in
modules.json - modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
Run details
- nf-core/tools version 2.10
- Run at
2023-11-16 01:26:58
@drpatelh @ewels now that Nextflow has channel topics, it occurred to me that we could actually simplify a lot by just using env outputs. See my comment here, but I will copy the example code to illustrate my point:
// current nf-core convention
process FOOBAR {
output:
path 'versions.yml', topic: versions
"""
# ...
cat <<-END_VERSIONS > versions.yml
"${task.process}":
foo: \$(foo --version)
bar: \$(bar --version)
END_VERSIONS
"""
}
// env output
process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), env(FOO_VERSION), topic: versions
tuple val("${task.process}"), val('bar'), env(BAR_VERSION), topic: versions
"""
# ...
FOO_VERSION=\$(foo --version)
BAR_VERSION=\$(bar --version)
"""
}
// cmd output
process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), cmd('foo --version'), topic: versions
tuple val("${task.process}"), val('bar'), cmd('bar --version'), topic: versions
"""
# ...
"""
}
I would love to hear what you guys think about (2) vs (3). Keep in mind that in all three cases, the tool version commands are executed in the task script in more or less the same way.
My preference is for option 3 - the new cmd style. I like keeping the version commands out of the script, as it makes the script commands much cleaner and easier to read.
NB: The versions1/versions2 stuff in the PR code diff can be simplified after https://github.com/nf-core/rnaseq/pull/1109 is merged. This new syntax is shown in Ben's comment.
My preference goes to version2, which I find more explicit, easier to read, but I do love the version3 that removes completely the version generation from the script itself.
I like three, my only concern is some of the commands to get the version get pretty long. In theory, we could do something like:
def foo_version = 'foo --version'
output:
tuple val("${task.process}"), val('foo'), cmd("${foo_version}"), topic: versions
I like option 3!
But I wonder about modules with R or python scripts, where we use those languages to create the versions.yml instead of bash. Will this work? or do we have to continue using the old syntax for these cases?
I would really like the inputs/ outputs section to remain as concise as possible, and I like the separation of concerns where the command to produce the output happens in more or less the same place. I'd do a bit of a WTF if people suddenly started embedding extensive process stuff where I expect the I/O.
So I have a fairly strong dislike for option 3), I think some fairly horrific stuff could happen there and make the processes hard to understand.
So option 2) for me please!
Agreeing with @pinin4fjords there, version3 looks beautiful as long as all works well, when it starts to bug, it's a mess to debug.
@pinin4fjords - note that one of the limitations of cmd (which will be documented) is that it doesn't support newlines.
That will hopefully prevent people from doing anything too horrendous 😆
We could have an nf-core modules linting rule that checks the string length and fails if it's too long, suggesting that people use env in that particular case instead.
@pinin4fjords - note that one of the limitations of
cmd(which will be documented) is that it doesn't support newlines.That will hopefully prevent people from doing anything too horrendous 😆
There's plenty of evil to be done with pipes!
But I wonder about modules with R or python scripts, where we use those languages to create the
versions.ymlinstead of bash. Will this work? or do we have to continue using the old syntax for these cases?
@mirpedrol - No it won't work. Suggestion would be to use env in these cases as in option 2 (no need for the old syntax with the cat <<-END_VERSIONS stuff). But there are relatively few of these non-bash modules, none in the rnaseq for example I think.
rnaseq has a couple of R modules actually, they're just not obvious because they're local- and we will hopefully fix that at some point, and they will then need templates etc.
Thank you all for your feedback. I still prefer env myself, but Paolo is determined now to add the cmd type, so we will have both and you can use whichever one you prefer.
My preference goes to version2, which I find more explicit, easier to read, but I do love the version3 that removes completely the version generation from the script itself.
Note that the cmd type is still executed in the task script just like an env, it just inserts the command for you
I like three, my only concern is some of the commands to get the version get pretty long.
@Emiller88 I don't think you can reference local variables in an output as in your example, but you could reference a global variable, for example:
foo_version = 'really | long | version | command'
process foo {
output:
cmd("${foo_version}")
}
But I wonder about modules with R or python scripts, where we use those languages to create the versions.yml instead of bash. Will this work? or do we have to continue using the old syntax for these cases?
@mirpedrol In this PR I changed all the processes to only emit the metadata and then the YAML is constructed at the end of the pipeline. If you usually generate the tool version from within a Python or R script, the cmd output could do something like python script.py --version to retrieve the version from Bash. If the process script itself is not Bash, however, then the cmd output won't work. So whenever cmd isn't supported or would be unwieldy to use, you can always fallback to an env
note that one of the limitations of cmd (which will be documented) is that it doesn't support newlines
You could have a multi-line command by using semi-colons for newlines 😅
Regarding multi-line outputs, we found a way to support them for both env and cmd. So whereas currently env outputs are squashed to a single line, both will support multi-line output going forward.
So I think everyone agrees that options 2 + 3 are both improvements ✅
For any processes with script blocks written in languages other than bash, we will have to use the env approach. For bash commands I see now three options, which maybe we can vote on in the nf-core Slack:
Option 1: env
// env output
process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), env(FOO_VERSION), topic: versions
tuple val("${task.process}"), val('bar'), env(BAR_VERSION), topic: versions
"""
# ...
FOO_VERSION=\$(foo --version)
BAR_VERSION=\$(bar --version)
"""
}
Option 2: cmd
// cmd output
process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), cmd('foo --version'), topic: versions
tuple val("${task.process}"), val('bar'), cmd('bar --version'), topic: versions
"""
# ...
"""
}
Option 3: cmd + variable
// cmd + variable output
foo_version = 'foo --version'
bar_version = 'bar --version'
process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), cmd(foo_version), topic: versions
tuple val("${task.process}"), val('bar'), cmd(bar_version), topic: versions
"""
# ...
"""
}
I think the real thing this could open up is parsing the version string in groovy as another option
// cmd + variable output
foo_version = getVersionFromString('foo --version')
bar_version = 'bar --version'
process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), cmd(foo_version), topic: versions
}
in a lib far far away:
def getVersionFromString(String text) {
def matcher = text =~ /v(\d+\.\d+\.\d+)/
return matcher ? matcher[0][1] : null
}
Just a thought.
The thing is that the command must be executed in the task environment, because Nextflow might not have access to the tool from outside the task.
You could just emit the raw output of the tool version command, remove the duplicates, and then parse the string in Groovy:
process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), cmd('foo --version'), topic: versions
}
Channel.topic('versions') .map { process, tool, raw_version ->
[ process, tool, getVersionFromString(tool, raw_version) ]
}
That comes down to whether you would rather parse the version with a Bash one-liner or Groovy code. Note that you have to write a custom parser for every tool, so putting it all in a lib far far away would break the modularity of your modules. Unless you have a way to "register" a parser from the module script.
Closing for now -- follow https://github.com/nf-core/fetchngs/pull/347 for latest updates