rnaseq Use eval output for tool versions

This PR uses the experimental cmd output type in https://github.com/nextflow-io/nextflow/pull/4493 to simplify the collection of tool versions.

Once the topic channel support is merged into Nextflow, we can merge this PR with #1109 to simplify things further. Instead of emitting versions1, versions2, etc for processes with multiple tools, we can simply send them all to the 'versions' topic.

PR checklist

[ ] This comment contains a description of changes (with reason).
[ ] Make sure your code lints (nf-core lint).
[ ] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
[ ] Usage Documentation in docs/usage.md is updated.
[ ] Output Documentation in docs/output.md is updated.
[ ] CHANGELOG.md is updated.
[ ] README.md is updated (including new tool citations and authors/contributors).

Nov 15 '23 22:11 bentsherman

For some reason the SALMON_QUANT process is failing:

Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:SALMON_QUANT (RAP1_UNINDUCED_REP2)` terminated with an error exit status (1)

Command executed:

  salmon quant \
      --geneMap genome_gfp.gtf \
      --threads 2 \
      --libType=SR \
      --index salmon \
      -r RAP1_UNINDUCED_REP2_primary.fastq.gz \
       \
      -o RAP1_UNINDUCED_REP2
  
  if [ -f RAP1_UNINDUCED_REP2/aux_info/meta_info.json ]; then
      cp RAP1_UNINDUCED_REP2/aux_info/meta_info.json "RAP1_UNINDUCED_REP2_meta_info.json"
  fi

Command exit status:
  1

Command output:
  (empty)

Command error:
Version Info: This is the most recent version of salmon.
### salmon (selective-alignment-based) v1.10.1
### [ program ] => salmon 
### [ command ] => quant 
### [ geneMap ] => { genome_gfp.gtf }
### [ threads ] => { 2 }
### [ libType ] => { SR }
### [ index ] => { salmon }
### [ unmatedReads ] => { RAP1_UNINDUCED_REP2_primary.fastq.gz }
### [ output ] => { RAP1_UNINDUCED_REP2 }
Logs will be written to RAP1_UNINDUCED_REP2/logs
[2023-11-15 06:41:55.947] [jointLog] [info] setting maxHashResizeThreads to 2
-----------------------------------------
| Loading contig table | Time = 63.579 us
-----------------------------------------
size = 840
[2023-11-15 06:41:55.947] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2023-11-15 06:41:55.947] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2023-11-15 06:41:55.947] [jointLog] [info] Setting consensusSlack to selective-alignment default of 0.35.
[2023-11-15 06:41:55.947] [jointLog] [info] parsing read library format
[2023-11-15 06:41:55.947] [jointLog] [info] There is 1 library.
[2023-11-15 06:41:55.947] [jointLog] [info] Loading pufferfish index
[2023-11-15 06:41:55.947] [jointLog] [info] Loading dense pufferfish index.
-----------------------------------------
| Loading contig offsets | Time = 1.6262 ms
-----------------------------------------
-----------------------------------------
| Loading reference lengths | Time = 3.809 us
-----------------------------------------
-----------------------------------------
| Loading mphf table | Time = 77.721 us
-----------------------------------------
size = 247935
Number of ones: 839
Number of ones per inventory item: 512
Inventory entries filled: 2
-----------------------------------------
| Loading contig boundaries | Time = 478.93 us
-----------------------------------------
size = 247935
-----------------------------------------
| Loading sequence | Time = 95.192 us
-----------------------------------------
size = 222765
-----------------------------------------
| Loading positions | Time = 346.52 us
-----------------------------------------
size = 381211
-----------------------------------------
| Loading reference sequence | Time = 86.075 us
-----------------------------------------
-----------------------------------------
| Loading reference accumulative lengths | Time = 2.828 us
-----------------------------------------
[2023-11-15 06:41:55.950] [jointLog] [info] done
[2023-11-15 06:41:56.000] [jointLog] [info] Index contained 126 targets
[2023-11-15 06:41:56.000] [jointLog] [info] Number of decoys : 1
[2023-11-15 06:41:56.000] [jointLog] [info] First decoy index : 125 

Error: no valid ID found for GFF record

Maybe dev is broken?

Nov 15 '23 22:11 bentsherman

Fetching upstream fixed it 👍

Nov 15 '23 22:11 bentsherman

`nf-core lint` overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit e68f451

+| ✅ 144 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   5 tests had warnings |!

:heavy_exclamation_mark: Test warnings:

files_exist - File not found: .github/workflows/awstest.yml
files_exist - File not found: .github/workflows/awsfulltest.yml
nextflow_config - Config manifest.version should end in dev: 3.13.0
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in WorkflowRnaseq.groovy: Optionally add in-text citation tools to this list.

:grey_question: Tests ignored:

files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
multiqc_config - multiqc_config

:white_check_mark: Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-rnaseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-rnaseq_logo_light.png
files_exist - File found: docs/images/nf-core-rnaseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: lib/WorkflowRnaseq.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-rnaseq_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (257 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: release-announcments.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: cloud_tests_small.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: cloud_tests_full.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.10
Run at 2023-11-16 01:26:58

Nov 15 '23 22:11 github-actions[bot]

@drpatelh @ewels now that Nextflow has channel topics, it occurred to me that we could actually simplify a lot by just using env outputs. See my comment here, but I will copy the example code to illustrate my point:

// current nf-core convention
process FOOBAR {
    output:
    path 'versions.yml', topic: versions

    """
    # ...

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        foo: \$(foo --version)
        bar: \$(bar --version)
    END_VERSIONS
    """
}

// env output
process FOOBAR {
    output:
    tuple val("${task.process}"), val('foo'), env(FOO_VERSION), topic: versions
    tuple val("${task.process}"), val('bar'), env(BAR_VERSION), topic: versions

    """
    # ...

    FOO_VERSION=\$(foo --version)
    BAR_VERSION=\$(bar --version)
    """
}

// cmd output
process FOOBAR {
    output:
    tuple val("${task.process}"), val('foo'), cmd('foo --version'), topic: versions
    tuple val("${task.process}"), val('bar'), cmd('bar --version'), topic: versions

    """
    # ...
    """
}

I would love to hear what you guys think about (2) vs (3). Keep in mind that in all three cases, the tool version commands are executed in the task script in more or less the same way.

Nov 27 '23 22:11 bentsherman

My preference is for option 3 - the new cmd style. I like keeping the version commands out of the script, as it makes the script commands much cleaner and easier to read.

NB: The versions1/versions2 stuff in the PR code diff can be simplified after https://github.com/nf-core/rnaseq/pull/1109 is merged. This new syntax is shown in Ben's comment.

Dec 11 '23 14:12 ewels

My preference goes to version2, which I find more explicit, easier to read, but I do love the version3 that removes completely the version generation from the script itself.

Dec 11 '23 15:12 maxulysse

I like three, my only concern is some of the commands to get the version get pretty long. In theory, we could do something like:

def foo_version = 'foo --version'

output:
    tuple val("${task.process}"), val('foo'), cmd("${foo_version}"), topic: versions

Dec 11 '23 15:12 edmundmiller

I like option 3! But I wonder about modules with R or python scripts, where we use those languages to create the versions.yml instead of bash. Will this work? or do we have to continue using the old syntax for these cases?

Dec 11 '23 15:12 mirpedrol

I would really like the inputs/ outputs section to remain as concise as possible, and I like the separation of concerns where the command to produce the output happens in more or less the same place. I'd do a bit of a WTF if people suddenly started embedding extensive process stuff where I expect the I/O.

So I have a fairly strong dislike for option 3), I think some fairly horrific stuff could happen there and make the processes hard to understand.

So option 2) for me please!

Dec 11 '23 15:12 pinin4fjords

Agreeing with @pinin4fjords there, version3 looks beautiful as long as all works well, when it starts to bug, it's a mess to debug.

Dec 11 '23 15:12 maxulysse

@pinin4fjords - note that one of the limitations of cmd (which will be documented) is that it doesn't support newlines.

That will hopefully prevent people from doing anything too horrendous 😆

We could have an nf-core modules linting rule that checks the string length and fails if it's too long, suggesting that people use env in that particular case instead.

Dec 11 '23 15:12 ewels

@pinin4fjords - note that one of the limitations of cmd (which will be documented) is that it doesn't support newlines.

That will hopefully prevent people from doing anything too horrendous 😆

There's plenty of evil to be done with pipes!

Dec 11 '23 15:12 pinin4fjords

But I wonder about modules with R or python scripts, where we use those languages to create the versions.yml instead of bash. Will this work? or do we have to continue using the old syntax for these cases?

@mirpedrol - No it won't work. Suggestion would be to use env in these cases as in option 2 (no need for the old syntax with the cat <<-END_VERSIONS stuff). But there are relatively few of these non-bash modules, none in the rnaseq for example I think.

Dec 11 '23 15:12 ewels

rnaseq has a couple of R modules actually, they're just not obvious because they're local- and we will hopefully fix that at some point, and they will then need templates etc.

Dec 11 '23 16:12 pinin4fjords

Thank you all for your feedback. I still prefer env myself, but Paolo is determined now to add the cmd type, so we will have both and you can use whichever one you prefer.

My preference goes to version2, which I find more explicit, easier to read, but I do love the version3 that removes completely the version generation from the script itself.

Note that the cmd type is still executed in the task script just like an env, it just inserts the command for you

I like three, my only concern is some of the commands to get the version get pretty long.

@Emiller88 I don't think you can reference local variables in an output as in your example, but you could reference a global variable, for example:

foo_version = 'really | long | version | command'

process foo {
  output:
  cmd("${foo_version}")
}

But I wonder about modules with R or python scripts, where we use those languages to create the versions.yml instead of bash. Will this work? or do we have to continue using the old syntax for these cases?

@mirpedrol In this PR I changed all the processes to only emit the metadata and then the YAML is constructed at the end of the pipeline. If you usually generate the tool version from within a Python or R script, the cmd output could do something like python script.py --version to retrieve the version from Bash. If the process script itself is not Bash, however, then the cmd output won't work. So whenever cmd isn't supported or would be unwieldy to use, you can always fallback to an env

note that one of the limitations of cmd (which will be documented) is that it doesn't support newlines

You could have a multi-line command by using semi-colons for newlines 😅

Regarding multi-line outputs, we found a way to support them for both env and cmd. So whereas currently env outputs are squashed to a single line, both will support multi-line output going forward.

Dec 11 '23 23:12 bentsherman

So I think everyone agrees that options 2 + 3 are both improvements ✅

For any processes with script blocks written in languages other than bash, we will have to use the env approach. For bash commands I see now three options, which maybe we can vote on in the nf-core Slack:

Option 1: `env`

// env output
process FOOBAR {
    output:
    tuple val("${task.process}"), val('foo'), env(FOO_VERSION), topic: versions
    tuple val("${task.process}"), val('bar'), env(BAR_VERSION), topic: versions

    """
    # ...

    FOO_VERSION=\$(foo --version)
    BAR_VERSION=\$(bar --version)
    """
}

Option 2: `cmd`

// cmd output
process FOOBAR {
    output:
    tuple val("${task.process}"), val('foo'), cmd('foo --version'), topic: versions
    tuple val("${task.process}"), val('bar'), cmd('bar --version'), topic: versions

    """
    # ...
    """
}

Option 3: `cmd` + variable

// cmd + variable output
foo_version = 'foo --version'
bar_version = 'bar --version'
process FOOBAR {
    output:
    tuple val("${task.process}"), val('foo'), cmd(foo_version), topic: versions
    tuple val("${task.process}"), val('bar'), cmd(bar_version), topic: versions

    """
    # ...
    """
}

Dec 12 '23 09:12 ewels

I think the real thing this could open up is parsing the version string in groovy as another option

// cmd + variable output
foo_version = getVersionFromString('foo --version')
bar_version = 'bar --version'

process FOOBAR {
    output:
    tuple val("${task.process}"), val('foo'), cmd(foo_version), topic: versions
}

in a lib far far away:

def getVersionFromString(String text) {
    def matcher = text =~ /v(\d+\.\d+\.\d+)/
    return matcher ? matcher[0][1] : null
}

Just a thought.

Dec 12 '23 21:12 edmundmiller

The thing is that the command must be executed in the task environment, because Nextflow might not have access to the tool from outside the task.

You could just emit the raw output of the tool version command, remove the duplicates, and then parse the string in Groovy:

process FOOBAR {
    output:
    tuple val("${task.process}"), val('foo'), cmd('foo --version'), topic: versions
}

Channel.topic('versions') .map { process, tool, raw_version ->
    [ process, tool, getVersionFromString(tool, raw_version) ]
}

That comes down to whether you would rather parse the version with a Bash one-liner or Groovy code. Note that you have to write a custom parser for every tool, so putting it all in a lib far far away would break the modularity of your modules. Unless you have a way to "register" a parser from the module script.

Dec 12 '23 21:12 bentsherman

Closing for now -- follow https://github.com/nf-core/fetchngs/pull/347 for latest updates

May 16 '25 02:05 bentsherman

rnaseq rnaseq copied to clipboard

Use eval output for tool versions

PR checklist

nf-core lint overall result: Passed :white_check_mark: :warning:

:heavy_exclamation_mark: Test warnings:

:grey_question: Tests ignored:

:white_check_mark: Tests passed:

Run details

Option 1: env

Option 2: cmd

Option 3: cmd + variable

rnaseq
rnaseq copied to clipboard

`nf-core lint` overall result: Passed :white_check_mark: :warning:

Option 1: `env`

Option 2: `cmd`

Option 3: `cmd` + variable