rnaseq Workflow output definition

This PR adds a workflow output definition based on https://github.com/nextflow-io/nextflow/pull/4784. I'm still working through the pipeline, but once I'm done, I will have completely replaced publishDir using the output DSL.

See also https://github.com/nf-core/fetchngs/pull/275 for ongoing discussion

Feb 28 '24 22:02 bentsherman

`nf-core lint` overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 783ff86

+| ✅ 170 tests passed       |+
#| ❔   7 tests were ignored |#
!| ❗   7 tests had warnings |!

:heavy_exclamation_mark: Test warnings:

files_exist - File not found: assets/multiqc_config.yml
files_exist - File not found: .github/workflows/awstest.yml
files_exist - File not found: .github/workflows/awsfulltest.yml
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

:grey_question: Tests ignored:

files_exist - File is ignored: conf/modules.config
nextflow_config - Config default ignored: params.ribo_database_manifest
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
multiqc_config - 'assets/multiqc_config.yml' not found

:white_check_mark: Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-rnaseq_logo_light.png
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-rnaseq_logo_light.png
files_exist - File found: docs/images/nf-core-rnaseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-rnaseq_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/WorkflowRnaseq.groovy
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 3.15.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.hisat2_build_memory= 200.GB
nextflow_config - Config default value correct: params.gtf_extra_attributes= gene_name
nextflow_config - Config default value correct: params.gtf_group_features= gene_id
nextflow_config - Config default value correct: params.featurecounts_group_type= gene_biotype
nextflow_config - Config default value correct: params.featurecounts_feature_type= exon
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes
nextflow_config - Config default value correct: params.trimmer= trimgalore
nextflow_config - Config default value correct: params.min_trimmed_reads= 10000
nextflow_config - Config default value correct: params.umitools_extract_method= string
nextflow_config - Config default value correct: params.umitools_grouping_method= directional
nextflow_config - Config default value correct: params.aligner= star_salmon
nextflow_config - Config default value correct: params.pseudo_aligner_kmer_size= 31
nextflow_config - Config default value correct: params.min_mapped_reads= 5.0
nextflow_config - Config default value correct: params.kallisto_quant_fraglen= 200
nextflow_config - Config default value correct: params.kallisto_quant_fraglen_sd= 200
nextflow_config - Config default value correct: params.deseq2_vst= true
nextflow_config - Config default value correct: params.rseqc_modules= bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication
nextflow_config - Config default value correct: params.skip_bbsplit= true
nextflow_config - Config default value correct: params.skip_preseq= true
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (531 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: cloud_tests_small.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: cloud_tests_full.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.13
Run at 2024-02-28 22:32:03

Feb 28 '24 22:02 github-actions[bot]

Looking at this PR, we would be moving the logic of the publishing from the configuration to the pipeline. I'm quite happy with that, but:

I know some people like publishing being a property of configuration (not convinced myself).
Could this solution be part of the emit block to create one place for outputs?
Could a .publish() operator to achieve the same thing but with a simpler syntax?

Mar 05 '24 12:03 adamrtalbot

OK I've read this comment which has made it more clear.

We add a second structure to Nextflow, the output block, which includes the definitions about what should be published where. This falls outside of processes and workflows which makes it more flexible. The original idea was to use topics but that was imprecise so process selectors are being used instead.

Taking a look again, I think the general idea is sound but I'm not sold on the syntax. I tried to rewrite it the other way around but ended up just re-inventing publishDir but worse 😆

output {
    select: [
        '.*:QUANTIFY_STAR_SALMON:SALMON_QUANT',
        '.*:QUANTIFY_STAR_SALMON:CUSTOM_TX2GENE',
        '.*:QUANTIFY_STAR_SALMON:TXIMETA_TXIMPORT',
        '.*:QUANTIFY_STAR_SALMON:SE_.*'
    ], path: "${params.outdir}/${params.aligner}", mode: params.publish_dir_mode
    select: 'NFCORE_RNASEQ:RNASEQ:SAMTOOLS_SORT'            , pattern: '*.bam', path: "${params.outdir}/${params.aligner}"              , enabled: { params.save_align_intermeds || params.save_umi_intermeds }
    select: 'NFCORE_RNASEQ:RNASEQ:UMITOOLS_PREPAREFORSALMON', pattern: '*.log', path: "${params.outdir}/${params.aligner}/umitools/logs", enabled: { params.save_align_intermeds || params.save_umi_intermeds }
    select: 'NFCORE_RNASEQ:RNASEQ:UMITOOLS_PREPAREFORSALMON', pattern: '*.bam', path: "${params.outdir}/${params.aligner}"              , enabled: { params.save_align_intermeds || params.save_umi_intermeds }
}

Mar 05 '24 14:03 adamrtalbot

@adamrtalbot check out the rnaseq PR that I linked above, it is less verbose than what you did. The syntax has evolved somewhat since Paolo's original proposal. Even so, the workflow outputs for rnaseq will be complicated no matter how you slice it.

Mar 05 '24 21:03 bentsherman

Liking this. If we really need the output block (rather than doing something with emit), this is a nice readable way of doing it.

Apr 04 '24 09:04 pinin4fjords

This is beginning to look great. All the publishing logic is in one location, easy to review and understand where it's coming from. There are two downsides to this approach:

You need to track back to the channel to find what's in there, which could be a little tricky.
It's quite verbose (there's a lot of text in one place). But then I would prefer explicit and verbose to implicit and concise.

Apr 04 '24 10:04 adamrtalbot

This is beginning to look great. All the publishing logic is in one location, easy to review and understand where it's coming from. There are two downsides to this approach:
1. You need to track back to the channel to find what's in there, which could be a little tricky.

2. It's quite verbose (there's a lot of text in one place). But then I would prefer explicit and verbose to implicit and concise.

Agreeing with Adam, it's a bit too implicit, especially what is a path what is a topic

Apr 04 '24 10:04 maxulysse

In the Nextflow PR there are some docs which explain the feature in more detail. Unfortunately the deploy preview isn't working so you'll have to look at the diff

You need to track back to the channel to find what's in there, which could be a little tricky.

Indeed this is the downside of selecting channels instead of processes. More flexible but more layers of indirection. We should be able to alleviate this with IDE tooling, i.e. hover over a selected channel to see it's definition

If we really need the output block (rather than doing something with emit), this is a nice readable way of doing it

Thanks @pinin4fjords , I never responded to your idea about putting everything in the emit section, but basically I think that would be way too cumbersome, imagine trying to fit the rnaseq outputs into the emits 😅

The main question now is, how to bring the outputs for PREPARE_GENOME and RNASEQ up to the top-level workflow? I was thinking some kind of include statement, otherwise we would have to pass a LOT of channels up through emits and/or topics.

Apr 04 '24 12:04 bentsherman

The current prototype simply maps the output channels to the publish directory structure, but we still need to get these outputs to the top level whereas currently they are nested under NFCORE_RNASEQ:...

Before I go off and add a gajillion channels to the emit section, I'd like to see if I can simplify things with topics.

@adamrtalbot @pinin4fjords @maxulysse @ewels Since you guys understand this pipeline better than me, I'm wondering, how would you group all of these outputs if you could group them any way you want? You are no longer restricted to process selectors or directory names, but you could use those if you wanted.

For example, I see the modules config for RNASEQ is grouped with these comments:

STAR Salmon alignment
General alignment
bigwig coverage
DESeq2 QC
Pseudo-alignment

Would those be good top-level groupings for outputs? Then you might have topics called align-star-salmon, align, bigwig, deseq2, etc. Or would you organize it differently?

Apr 08 '24 20:04 bentsherman

I managed to move everything to the top-level workflow, so it should be executable now (though there are likely some bugs, will test tomorrow).

I ended up using topics for everything, using the various publish directories to guide the topic names. Hope this gives you a more concrete sense of how topics are useful.

The topics don't really reduce the amount of code, they just split it between the output DSL and the workflow topic: section. In a weird way, this provides some modularity, since workflows can define some ontology of topics which can in turn be used by the output DSL for publishing.

Apr 09 '24 04:04 bentsherman

As Evan mentioned on Slack, this does seem very verbose:

    QUANTIFY_STAR_SALMON.out.results                        >> 'align'
    QUANTIFY_STAR_SALMON.out.tpm_gene                       >> 'align'
    QUANTIFY_STAR_SALMON.out.counts_gene                    >> 'align'
    QUANTIFY_STAR_SALMON.out.lengths_gene                   >> 'align'
    QUANTIFY_STAR_SALMON.out.counts_gene_length_scaled      >> 'align'
    QUANTIFY_STAR_SALMON.out.counts_gene_scaled             >> 'align'
    QUANTIFY_STAR_SALMON.out.tpm_transcript                 >> 'align'
    QUANTIFY_STAR_SALMON.out.counts_transcript              >> 'align'
    QUANTIFY_STAR_SALMON.out.lengths_transcript             >> 'align'

But I understand why, since if even one of the outputs from a process needs to go to a different topic then you can't use the multi-channel object QUANTIFY_STAR_SALMON.out.

Rather than doing this from the calling workflow, could e.g. QUANTIFY_STAR_SALMON use a topic as part of its emit, to 'suggest' a classification for that channel?

emit:
    results = ch_pseudo_results, topic = 'tables'

Then, if we all used good standards (e.g. an ontology for topics for outputs), calling workflows could have very minimal logic for this, relying on what the components said about their outputs. The calling workflow would only need to decide what to do with the topics in its outputs.

Apr 09 '24 09:04 pinin4fjords

We can definitely move some of these topic mappings into the modules and subworkflows, that was going to be my next step. I also suspect that nf-core will be able to converge on a shared ontology for these things.

I'd still rather keep the topic mapping separate from the emits though, as we will need the topic: section either way and we're trying to minimize the number of ways to do the same thing

Apr 09 '24 14:04 bentsherman

I moved most of the topic mappings into their respective subworkflows. It gets tricky when a workflow is used multiple times under a different name and with different publish behavior.

For example, QUANTIFY_PSEUDO_ALIGNMENT is used twice in RNASEQ, once as itself and once as the alias QUANTIFY_STAR_SALMON. One publishes to the folder "${params.aligner}" while the other publishes to "${params.pseudo_aligner}".

I can't set a "sensible default" in the subworkflow because I can't override the default later, I can only specify additional topics. Or I could specify a default and not use it in the output definition for rnaseq, instead re-mapping each alias to different topics as I am currently doing.

However, keeping the topic mappings in the RNASEQ workflow is also tricky because the process/workflow might not be executed, in which case the topic mapping will fail. We might need to replicate the control flow in the topic: section:

  topic:
  if{ !params.skip_alignment && params.aligner == 'star_rsem' ) {
    DESEQ2_QC_RSEM.out.rdata                >> 'align-deseq2'
    DESEQ2_QC_RSEM.out.pca_txt              >> 'align-deseq2'
    DESEQ2_QC_RSEM.out.pdf                  >> 'align-deseq2'
    DESEQ2_QC_RSEM.out.dists_txt            >> 'align-deseq2'
    DESEQ2_QC_RSEM.out.size_factors         >> 'align-deseq2'
    DESEQ2_QC_RSEM.out.log                  >> 'align-deseq2'
  }

Totally doable, but unfortunate if we have to resort to it

@adamrtalbot noted in Slack that most Nextflow pipelines don't come close to this level of complexity, so I wouldn't be opposed to moving forward with what we have and let the rnaseq maintainers sort out the details. Though we do need to address the last point about conditional topic mappings

Apr 09 '24 21:04 bentsherman

I'm loving the principle:

    "${params.aligner}" {
        'log' {
            from 'align-star-log'
        }

        from 'align-star-intermeds'

        'unmapped' {
            from 'align-star-unaligned'
        }
    }

The multi import thing didn't occur to me. Could we use a variable sent in via the meta or somesuch to control the topic something gets sent to?

Apr 10 '24 08:04 pinin4fjords

For example, QUANTIFY_PSEUDO_ALIGNMENT is used twice in RNASEQ, once as itself and once as the alias QUANTIFY_STAR_SALMON. One publishes to the folder "${params.aligner}" while the other publishes to "${params.pseudo_aligner}".

In this case I would manipulate the channel to what I wanted. If I had to use a topic I would use them at the last second. So again, as long as topics are optional I think everything can be handled reasonably well.

However, keeping the topic mappings in the RNASEQ workflow is also tricky because the process/workflow might not be executed, in which case the topic mapping will fail. We might need to replicate the control flow in the topic: section:

Presumably, if a topic is empty it just doesn't publish anything? So you could add stuff from an empty channel and you would end up with an empty topic. In your example, it would make more sense to fix the rnaseq code so it doesn't rely on lots of if statements, which would end up looking like this:

  topic:
  deseq2_qc_rdata        >> 'align-deseq2'
  deseq2_qc_pca_txt      >> 'align-deseq2'
  deseq2_qc_pdf          >> 'align-deseq2'
  deseq2_qc_dists_txt    >> 'align-deseq2'
  deseq2_qc_size_factors >> 'align-deseq2'
  deseq2_qc_log          >> 'align-deseq2'

Even better, just tidy up the channels before making the topic:

  topic:
  deseq2_qc >> 'align-deseq2'

I think my overall impression is topics are a nice sugar on top of existing channels, in which case most of the key logic should be in the channel manipulations. Topics are a way of turning a local channel into a global one and should do very little else.

One publishes to the folder "${params.aligner}" while the other publishes to "${params.pseudo_aligner}".

That sounds like a bug 😆

Apr 10 '24 11:04 adamrtalbot

Notes on the latest update:

Topics are no longer used. Nextflow simply maintains a global map of channels to "rules" under the hood
The output DSL is no longer a potentially nested directory structure, it's just a flat list of rules. Each rule can specify publish options for channels that are sent to the rule
In principle, the rule name can be anything. In practice, it is convenient to make it the default publish path. If you're happy with that, you don't need to configure anything else and Nextflow will use it as the publish path
Processes and workflows can have a publish: section to define these mappings. A process can map emits to rules, a workflow can map channels to rules
The output DSL is used only to (1) set the output directory, (2) set default publish options like mode, (3) customize rules as needed
In general, rules need to be customized only when the path should be different or additional options like enabled are needed. If you can align your output directory with the module/workflow defaults, your output definition can be quite short (see fetchngs)
If a process maps some emits to some rules and then is invoked by a workflow, the workflow can re-map the process outputs to different rules and overwrite the process defaults, and so on with workflows and subworkflows, etc

Overall, everything is much more concise and more in line with what many people have suggested, to simply annotate the workflows with the publish paths. The output definition is no longer a comprehensive view of all outputs, but there is a degree of modularity, and you can be verbose in the output definition if you want to.

@adamrtalbot thanks for your comments, makes me feel more confident about the prototype. I think all of the remaining TODOs can be addressed by refactoring some dataflow logic, it can be handled by the rnaseq devs.

Apr 11 '24 19:04 bentsherman

Really liking the way this is going now, it's going to be very tidy.

Would it be feasible at some point to use some optional dynamism in the modules, to facilitate repeated usage?

    publish:
    ch_orig_bam         >> "star_salmon/intermeds/${meta.publish_suffix}/"

Apr 12 '24 08:04 pinin4fjords

Would it be feasible at some point to use some optional dynamism in the modules, to facilitate repeated usage?

Maybe in a future iteration. But related to this, we are interested in building on the concept of the samplesheet as a way to hold metadata for file collections in general, and it might be a better practice than trying to encode metadata in the filenames.

For example Paolo has proposed that we have a method in the output DSL to automatically publish an index file for a given "rule":

output {
  directory 'results'

  'star_salmon/intermeds/' {
    index 'index.csv'
  }
}

star_salmon/intermeds/index.csv

sample_id,bam
sample001,results/star_salmon/intermeds/sample001.bam
sample002,results/star_salmon/intermeds/sample002.bam
sample003,results/star_salmon/intermeds/sample003.bam

Of course you could also do this manually like in fetchngs, and I would like to add a stdlib function like mergeCsv to make it easier, but the index method would be a convenient solution for the most common and simple cases. Either way, you can just query the index file instead of inspecting the file names.

Apr 12 '24 13:04 bentsherman

The redirect to null simplifies the top-level publish def somewhat. The remaining rules could also be moved into the workflow defs since they only rename paths. It just might be more verbose since you would have to remap each channel instead of the target name.

It seems like the best delineation for what goes in the top-level publish block vs the workflow publish sections is, the workflows define what is published (including conditional logic) while the top-level publish def should define how things are being published (mode, whether to overwrite, content type, tags, etc). This is also good for modularity.

Note that some subworkflows are now using params which is an anti-pattern. For this I recommend passing those params as workflow inputs to keep things modular.

Apr 24 '24 22:04 bentsherman

Note that some subworkflows are now using params which is an anti-pattern. For this I recommend passing those params as workflow inputs to keep things modular.

We have been trying to eliminate that when we see it

Apr 25 '24 08:04 pinin4fjords

rnaseq rnaseq copied to clipboard

Workflow output definition

nf-core lint overall result: Passed :white_check_mark: :warning:

:heavy_exclamation_mark: Test warnings:

:grey_question: Tests ignored:

:white_check_mark: Tests passed:

Run details

rnaseq
rnaseq copied to clipboard

`nf-core lint` overall result: Passed :white_check_mark: :warning: