differentialabundance icon indicating copy to clipboard operation
differentialabundance copied to clipboard

Add toolsheet-related implementations

Open suzannejin opened this issue 10 months ago • 26 comments

Changes made to enable the usage of toolsheets:

  • [x] Add predefined toolsheets -> they define the combinations of tools allowed for each study type
  • [x] Add toolsheet_custom flag -> needed later for benchmark functionalities
  • [x] Modify gsea_run and gprofiler2_run flags into functional_method -> more consistent with the ch_tools behaviour
  • [x] Update report rmd file to handle differential_method and functional_method flags
  • [x] Add code to parse ch_tools from toolsheets
  • [x] Handle the results channels inside the workflow in a way that it considers the tools used
  • [x] Make ch_tools args accesible by modules.config through meta.args

Nice to have, but left for next PRs (to not extend this one too much):

  • [ ] Replace the checks on params.functional_method, etc by checks on ch_tools
  • [ ] Add checks for analysis_name flag
  • [ ] Remove tool-specific fixed params (eg. differential_file_suffix, etc) from user params scope (having too many params is confusing) and have them defined inside the workflow based on the chosen method.
  • [ ] Add tests for gprofiler2 runs
  • [ ] Clean some redundant checks/code/tests related to study_type vs tool combination - before it must be checked, but now we have the predefined tools that dictate which combinations can go together.
  • [ ] Create ch_background based on ch_tools instead of params

PR checklist

  • [x] This comment contains a description of changes (with reason).
  • [x] If you've fixed a bug or added code that should be tested, add tests!
  • [x] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • [ ] If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • [x] Make sure your code lints (nf-core lint).
  • [x] Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • [ ] Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • [x] Usage Documentation in docs/usage.md is updated.
  • [ ] Output Documentation in docs/output.md is updated.
  • [x] CHANGELOG.md is updated.
  • [x] README.md is updated (including new tool citations and authors/contributors).

suzannejin avatar Mar 06 '25 16:03 suzannejin

nf-core pipelines lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 641ff2d

+| ✅ 294 tests passed       |+
#| ❔   7 tests were ignored |#
!| ❗   7 tests had warnings |!

:heavy_exclamation_mark: Test warnings:

  • pipeline_todos - TODO string in nextflow.config: Update the field with the details of the contributors to your pipeline. New with Nextflow version 24.10.0
  • pipeline_todos - TODO string in ro-crate-metadata.json: "description": "

    \n \n <source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-differentialabundance_logo_dark.png">\n <img alt="nf-core/differentialabundance" src="docs/images/nf-core-differentialabundance_logo_light.png">\n \n

    \n\nGitHub Actions CI Status\nGitHub Actions Linting StatusAWS CICite with Zenodo\nnf-test\n\nNextflow\nrun with conda\nrun with docker\nrun with singularity\nLaunch on Seqera Platform\n\nGet help on SlackFollow on TwitterFollow on MastodonWatch on YouTube\n\n## Introduction\n\nnf-core/differentialabundance is a bioinformatics pipeline that ...\n\n TODO nf-core:\n Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the\n major pipeline sections and the types of output it produces. You're giving an overview to someone new\n to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction\n\n\n Include a figure that guides the user through the major workflow steps. Many nf-core\n workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. \n Fill in short bullet-pointed list of the default steps in the pipeline \n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.\n\n Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.\n Explain what rows and columns represent. For instance (please edit as appropriate):\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\nsamplesheet.csv:\n\ncsv\nsample,fastq_1,fastq_2\nCONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz\n\n\nEach row represents a fastq file (single-end) or a pair of fastq files (paired end).\n\n\n\nNow, you can run the pipeline using:\n\n update the following command to include all required parameters for a minimal example \n\nbash\nnextflow run nf-core/differentialabundance \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\nFor more details and further functionality, please refer to the usage documentation and the parameter documentation.\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\noutput documentation.\n\n## Credits\n\nnf-core/differentialabundance was originally written by Oskar Wacker, Jonathan Manning.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n If applicable, make list of people who have also contributed \n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\nFor further information or help, don't hesitate to get in touch on the Slack #differentialabundance channel (you can join with this invite).\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use nf-core/differentialabundance for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\n Add bibliography of tools and data used in your pipeline \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nYou can cite the nf-core publication as follows:\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n",
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • schema_lint - Parameter input is not defined in the correct subschema (input_output_options)

:grey_question: Tests ignored:

:white_check_mark: Tests passed:

  • files_exist - File found: .gitattributes
  • files_exist - File found: .gitignore
  • files_exist - File found: .nf-core.yml
  • files_exist - File found: .editorconfig
  • files_exist - File found: .prettierignore
  • files_exist - File found: .prettierrc.yml
  • files_exist - File found: CHANGELOG.md
  • files_exist - File found: CITATIONS.md
  • files_exist - File found: CODE_OF_CONDUCT.md
  • files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_exist - File found: nextflow_schema.json
  • files_exist - File found: nextflow.config
  • files_exist - File found: README.md
  • files_exist - File found: .github/.dockstore.yml
  • files_exist - File found: .github/CONTRIBUTING.md
  • files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
  • files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
  • files_exist - File found: .github/workflows/branch.yml
  • files_exist - File found: .github/workflows/ci.yml
  • files_exist - File found: .github/workflows/linting_comment.yml
  • files_exist - File found: .github/workflows/linting.yml
  • files_exist - File found: assets/email_template.html
  • files_exist - File found: assets/email_template.txt
  • files_exist - File found: assets/sendmail_template.txt
  • files_exist - File found: assets/nf-core-differentialabundance_logo_light.png
  • files_exist - File found: conf/modules.config
  • files_exist - File found: conf/test.config
  • files_exist - File found: conf/test_full.config
  • files_exist - File found: docs/images/nf-core-differentialabundance_logo_light.png
  • files_exist - File found: docs/images/nf-core-differentialabundance_logo_dark.png
  • files_exist - File found: docs/output.md
  • files_exist - File found: docs/README.md
  • files_exist - File found: docs/README.md
  • files_exist - File found: docs/usage.md
  • files_exist - File found: main.nf
  • files_exist - File found: conf/base.config
  • files_exist - File found: conf/igenomes.config
  • files_exist - File found: conf/igenomes_ignored.config
  • files_exist - File found: .github/workflows/awstest.yml
  • files_exist - File found: .github/workflows/awsfulltest.yml
  • files_exist - File found: modules.json
  • files_exist - File found: ro-crate-metadata.json
  • files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
  • files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
  • files_exist - File not found check: .github/workflows/push_dockerhub.yml
  • files_exist - File not found check: .markdownlint.yml
  • files_exist - File not found check: .nf-core.yaml
  • files_exist - File not found check: .yamllint.yml
  • files_exist - File not found check: bin/markdown_to_html.r
  • files_exist - File not found check: conf/aws.config
  • files_exist - File not found check: docs/images/nf-core-differentialabundance_logo.png
  • files_exist - File not found check: lib/Checks.groovy
  • files_exist - File not found check: lib/Completion.groovy
  • files_exist - File not found check: lib/NfcoreTemplate.groovy
  • files_exist - File not found check: lib/Utils.groovy
  • files_exist - File not found check: lib/Workflow.groovy
  • files_exist - File not found check: lib/WorkflowMain.groovy
  • files_exist - File not found check: lib/WorkflowDifferentialabundance.groovy
  • files_exist - File not found check: parameters.settings.json
  • files_exist - File not found check: pipeline_template.yml
  • files_exist - File not found check: Singularity
  • files_exist - File not found check: lib/nfcore_external_java_deps.jar
  • files_exist - File not found check: .travis.yml
  • nextflow_config - Found nf-schema plugin
  • nextflow_config - Config variable found: manifest.name
  • nextflow_config - Config variable found: manifest.nextflowVersion
  • nextflow_config - Config variable found: manifest.description
  • nextflow_config - Config variable found: manifest.version
  • nextflow_config - Config variable found: manifest.homePage
  • nextflow_config - Config variable found: timeline.enabled
  • nextflow_config - Config variable found: trace.enabled
  • nextflow_config - Config variable found: report.enabled
  • nextflow_config - Config variable found: dag.enabled
  • nextflow_config - Config variable found: process.cpus
  • nextflow_config - Config variable found: process.memory
  • nextflow_config - Config variable found: process.time
  • nextflow_config - Config variable found: params.outdir
  • nextflow_config - Config variable found: params.input
  • nextflow_config - Config variable found: validation.help.enabled
  • nextflow_config - Config variable found: manifest.mainScript
  • nextflow_config - Config variable found: timeline.file
  • nextflow_config - Config variable found: trace.file
  • nextflow_config - Config variable found: report.file
  • nextflow_config - Config variable found: dag.file
  • nextflow_config - Config variable found: validation.help.beforeText
  • nextflow_config - Config variable found: validation.help.afterText
  • nextflow_config - Config variable found: validation.help.command
  • nextflow_config - Config variable found: validation.summary.beforeText
  • nextflow_config - Config variable found: validation.summary.afterText
  • nextflow_config - Config variable (correctly) not found: params.nf_required_version
  • nextflow_config - Config variable (correctly) not found: params.container
  • nextflow_config - Config variable (correctly) not found: params.singleEnd
  • nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
  • nextflow_config - Config variable (correctly) not found: params.name
  • nextflow_config - Config variable (correctly) not found: params.enable_conda
  • nextflow_config - Config variable (correctly) not found: params.max_cpus
  • nextflow_config - Config variable (correctly) not found: params.max_memory
  • nextflow_config - Config variable (correctly) not found: params.max_time
  • nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
  • nextflow_config - Config variable (correctly) not found: params.validationLenientMode
  • nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
  • nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
  • nextflow_config - Config timeline.enabled had correct value: true
  • nextflow_config - Config report.enabled had correct value: true
  • nextflow_config - Config trace.enabled had correct value: true
  • nextflow_config - Config dag.enabled had correct value: true
  • nextflow_config - Config manifest.name began with nf-core/
  • nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
  • nextflow_config - Config dag.file ended with .html
  • nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
  • nextflow_config - Config manifest.version ends in dev: 1.6.0dev
  • nextflow_config - Config params.custom_config_version is set to master
  • nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
  • nextflow_config - Lines for loading custom profiles found
  • nextflow_config - nextflow.config contains configuration profile test
  • nextflow_config - Config default value correct: params.study_name= study
  • nextflow_config - Config default value correct: params.study_type= rnaseq
  • nextflow_config - Config default value correct: params.study_abundance_type= counts
  • nextflow_config - Config default value correct: params.round_digits= 4
  • nextflow_config - Config default value correct: params.observations_id_col= sample
  • nextflow_config - Config default value correct: params.observations_type= sample
  • nextflow_config - Config default value correct: params.features_id_col= gene_id
  • nextflow_config - Config default value correct: params.features_name_col= gene_name
  • nextflow_config - Config default value correct: params.features_type= gene
  • nextflow_config - Config default value correct: params.features_metadata_cols= gene_id,gene_name,gene_biotype
  • nextflow_config - Config default value correct: params.affy_file_name_col= file
  • nextflow_config - Config default value correct: params.affy_background= true
  • nextflow_config - Config default value correct: params.affy_bgversion= 2
  • nextflow_config - Config default value correct: params.affy_build_annotation= true
  • nextflow_config - Config default value correct: params.features_gtf_feature_type= transcript
  • nextflow_config - Config default value correct: params.features_gtf_table_first_field= gene_id
  • nextflow_config - Config default value correct: params.proteus_measurecol_prefix= LFQ intensity
  • nextflow_config - Config default value correct: params.proteus_norm_function= normalizeMedian
  • nextflow_config - Config default value correct: params.proteus_plotsd_method= violin
  • nextflow_config - Config default value correct: params.proteus_plotmv_loess= true
  • nextflow_config - Config default value correct: params.proteus_palette_name= Set1
  • nextflow_config - Config default value correct: params.filtering_min_abundance= 1.0
  • nextflow_config - Config default value correct: params.filtering_min_samples= 1.0
  • nextflow_config - Config default value correct: params.filtering_min_proportion_not_na= 0.5
  • nextflow_config - Config default value correct: params.immunedeconv_method= quantiseq
  • nextflow_config - Config default value correct: params.immunedeconv_function= deconvolute
  • nextflow_config - Config default value correct: params.exploratory_clustering_method= ward.D2
  • nextflow_config - Config default value correct: params.exploratory_cor_method= spearman
  • nextflow_config - Config default value correct: params.exploratory_n_features= 500
  • nextflow_config - Config default value correct: params.exploratory_whisker_distance= 1.5
  • nextflow_config - Config default value correct: params.exploratory_mad_threshold= -5
  • nextflow_config - Config default value correct: params.exploratory_main_variable= auto_pca
  • nextflow_config - Config default value correct: params.exploratory_assay_names= raw,normalised,variance_stabilised
  • nextflow_config - Config default value correct: params.exploratory_final_assay= variance_stabilised
  • nextflow_config - Config default value correct: params.exploratory_log2_assays= raw,normalised
  • nextflow_config - Config default value correct: params.exploratory_palette_name= Set1
  • nextflow_config - Config default value correct: params.differential_method= deseq2
  • nextflow_config - Config default value correct: params.differential_feature_id_column= gene_id
  • nextflow_config - Config default value correct: params.differential_fc_column= log2FoldChange
  • nextflow_config - Config default value correct: params.differential_pval_column= pvalue
  • nextflow_config - Config default value correct: params.differential_qval_column= padj
  • nextflow_config - Config default value correct: params.differential_min_fold_change= 2.0
  • nextflow_config - Config default value correct: params.differential_max_pval= 1.0
  • nextflow_config - Config default value correct: params.differential_max_qval= 0.05
  • nextflow_config - Config default value correct: params.differential_feature_name_column= gene_name
  • nextflow_config - Config default value correct: params.differential_foldchanges_logged= true
  • nextflow_config - Config default value correct: params.differential_palette_name= Set1
  • nextflow_config - Config default value correct: params.deseq2_test= Wald
  • nextflow_config - Config default value correct: params.deseq2_fit_type= parametric
  • nextflow_config - Config default value correct: params.deseq2_sf_type= ratio
  • nextflow_config - Config default value correct: params.deseq2_min_replicates_for_replace= 7
  • nextflow_config - Config default value correct: params.deseq2_independent_filtering= true
  • nextflow_config - Config default value correct: params.deseq2_lfc_threshold= 0
  • nextflow_config - Config default value correct: params.deseq2_alt_hypothesis= greaterAbs
  • nextflow_config - Config default value correct: params.deseq2_p_adjust_method= BH
  • nextflow_config - Config default value correct: params.deseq2_alpha= 0.1
  • nextflow_config - Config default value correct: params.deseq2_minmu= 0.5
  • nextflow_config - Config default value correct: params.deseq2_vs_method= vst
  • nextflow_config - Config default value correct: params.deseq2_shrink_lfc= true
  • nextflow_config - Config default value correct: params.deseq2_cores= 1
  • nextflow_config - Config default value correct: params.deseq2_vs_blind= true
  • nextflow_config - Config default value correct: params.deseq2_vst_nsub= 1000
  • nextflow_config - Config default value correct: params.limma_method= ls
  • nextflow_config - Config default value correct: params.limma_proportion= 0.01
  • nextflow_config - Config default value correct: params.limma_stdev_coef_lim= 0.1,4
  • nextflow_config - Config default value correct: params.limma_winsor_tail_p= 0.05,0.1
  • nextflow_config - Config default value correct: params.limma_lfc= 0
  • nextflow_config - Config default value correct: params.limma_adjust_method= BH
  • nextflow_config - Config default value correct: params.limma_p_value= 1.0
  • nextflow_config - Config default value correct: params.gsea_permute= phenotype
  • nextflow_config - Config default value correct: params.gsea_nperm= 1000
  • nextflow_config - Config default value correct: params.gsea_scoring_scheme= weighted
  • nextflow_config - Config default value correct: params.gsea_metric= Signal2Noise
  • nextflow_config - Config default value correct: params.gsea_sort= real
  • nextflow_config - Config default value correct: params.gsea_order= descending
  • nextflow_config - Config default value correct: params.gsea_set_max= 500
  • nextflow_config - Config default value correct: params.gsea_set_min= 15
  • nextflow_config - Config default value correct: params.gsea_norm= meandiv
  • nextflow_config - Config default value correct: params.gsea_rnd_type= no_balance
  • nextflow_config - Config default value correct: params.gsea_make_sets= true
  • nextflow_config - Config default value correct: params.gsea_num= 100
  • nextflow_config - Config default value correct: params.gsea_plot_top_x= 20
  • nextflow_config - Config default value correct: params.gsea_rnd_seed= timestamp
  • nextflow_config - Config default value correct: params.gprofiler2_significant= true
  • nextflow_config - Config default value correct: params.gprofiler2_correction_method= gSCS
  • nextflow_config - Config default value correct: params.gprofiler2_max_qval= 0.05
  • nextflow_config - Config default value correct: params.gprofiler2_background_file= auto
  • nextflow_config - Config default value correct: params.gprofiler2_domain_scope= annotated
  • nextflow_config - Config default value correct: params.gprofiler2_min_diff= 1
  • nextflow_config - Config default value correct: params.gprofiler2_palette_name= Blues
  • nextflow_config - Config default value correct: params.shinyngs_build_app= true
  • nextflow_config - Config default value correct: params.report_scree= true
  • nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
  • nextflow_config - Config default value correct: params.custom_config_version= master
  • nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
  • nextflow_config - Config default value correct: params.publish_dir_mode= copy
  • nextflow_config - Config default value correct: params.validate_params= true
  • nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
  • files_unchanged - .gitattributes matches the template
  • files_unchanged - .prettierrc.yml matches the template
  • files_unchanged - CODE_OF_CONDUCT.md matches the template
  • files_unchanged - LICENSE matches the template
  • files_unchanged - .github/.dockstore.yml matches the template
  • files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
  • files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
  • files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
  • files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
  • files_unchanged - .github/workflows/branch.yml matches the template
  • files_unchanged - .github/workflows/linting_comment.yml matches the template
  • files_unchanged - .github/workflows/linting.yml matches the template
  • files_unchanged - assets/email_template.html matches the template
  • files_unchanged - assets/email_template.txt matches the template
  • files_unchanged - assets/sendmail_template.txt matches the template
  • files_unchanged - assets/nf-core-differentialabundance_logo_light.png matches the template
  • files_unchanged - docs/images/nf-core-differentialabundance_logo_light.png matches the template
  • files_unchanged - docs/images/nf-core-differentialabundance_logo_dark.png matches the template
  • files_unchanged - docs/README.md matches the template
  • files_unchanged - .gitignore matches the template
  • files_unchanged - .prettierignore matches the template
  • actions_ci - '.github/workflows/ci.yml' is triggered on expected events
  • actions_ci - '.github/workflows/ci.yml' checks minimum NF version
  • actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
  • actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
  • actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
  • readme - README Nextflow minimum version badge matched config. Badge: 24.04.2, Config: 24.04.2
  • readme - README Zenodo placeholder was replaced with DOI.
  • plugin_includes - No wrong validation plugin imports have been found
  • pipeline_name_conventions - Name adheres to nf-core convention
  • template_strings - Did not find any Jinja template strings (0 files)
  • schema_lint - Schema lint passed
  • schema_lint - Schema title + description lint passed
  • schema_params - Schema matched params returned from nextflow config
  • system_exit - No System.exit calls found
  • actions_schema_validation - Workflow validation passed: ci.yml
  • actions_schema_validation - Workflow validation passed: awsfulltest.yml
  • actions_schema_validation - Workflow validation passed: linting.yml
  • actions_schema_validation - Workflow validation passed: branch.yml
  • actions_schema_validation - Workflow validation passed: linting_comment.yml
  • actions_schema_validation - Workflow validation passed: download_pipeline.yml
  • actions_schema_validation - Workflow validation passed: clean-up.yml
  • actions_schema_validation - Workflow validation passed: awstest.yml
  • actions_schema_validation - Workflow validation passed: fix-linting.yml
  • actions_schema_validation - Workflow validation passed: release-announcements.yml
  • actions_schema_validation - Workflow validation passed: template_version_comment.yml
  • merge_markers - No merge markers found in pipeline files
  • modules_json - Only installed modules found in modules.json
  • modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
  • local_component_structure - local subworkflows directory structure is correct 'subworkflows/local/TOOL/SUBTOOL'
  • base_config - conf/base.config found and not ignored.
  • modules_config - conf/modules.config found and not ignored.
  • modules_config - GUNZIP_GTF found in conf/modules.config and Nextflow scripts.
  • modules_config - GTF_TO_TABLE found in conf/modules.config and Nextflow scripts.
  • modules_config - VALIDATOR found in conf/modules.config and Nextflow scripts.
  • modules_config - AFFY_JUSTRMA_RAW found in conf/modules.config and Nextflow scripts.
  • modules_config - AFFY_JUSTRMA_NORM found in conf/modules.config and Nextflow scripts.
  • modules_config - PROTEUS found in conf/modules.config and Nextflow scripts.
  • modules_config - GEOQUERY_GETGEO found in conf/modules.config and Nextflow scripts.
  • modules_config - CUSTOM_MATRIXFILTER found in conf/modules.config and Nextflow scripts.
  • modules_config - IMMUNEDECONV found in conf/modules.config and Nextflow scripts.
  • modules_config - DESEQ2_NORM found in conf/modules.config and Nextflow scripts.
  • modules_config - DESEQ2_DIFFERENTIAL found in conf/modules.config and Nextflow scripts.
  • modules_config - LIMMA_NORM found in conf/modules.config and Nextflow scripts.
  • modules_config - LIMMA_DIFFERENTIAL found in conf/modules.config and Nextflow scripts.
  • modules_config - VARIANCEPARTITION_DREAM found in conf/modules.config and Nextflow scripts.
  • modules_config - CUSTOM_FILTERDIFFERENTIALTABLE found in conf/modules.config and Nextflow scripts.
  • modules_config - GSEA_GSEA found in conf/modules.config and Nextflow scripts.
  • modules_config - CUSTOM_TABULARTOGSEACLS found in conf/modules.config and Nextflow scripts.
  • modules_config - CUSTOM_TABULARTOGSEAGCT found in conf/modules.config and Nextflow scripts.
  • modules_config - CUSTOM_TABULARTOGSEACHIP found in conf/modules.config and Nextflow scripts.
  • modules_config - GPROFILER2_GOST found in conf/modules.config and Nextflow scripts.
  • modules_config - PLOT_EXPLORATORY found in conf/modules.config and Nextflow scripts.
  • modules_config - PLOT_DIFFERENTIAL found in conf/modules.config and Nextflow scripts.
  • modules_config - SHINYNGS_APP found in conf/modules.config and Nextflow scripts.
  • modules_config - RMARKDOWNNOTEBOOK found in conf/modules.config and Nextflow scripts.
  • modules_config - MAKE_REPORT_BUNDLE found in conf/modules.config and Nextflow scripts.
  • nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
  • nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.2.1

Run details

  • nf-core/tools version 3.2.1
  • Run at 2025-05-29 12:08:46

github-actions[bot] avatar Mar 06 '25 16:03 github-actions[bot]

Thanks @suzannejin - another big one. Will review thoroughly when I can, just give me some time :-)

pinin4fjords avatar Mar 20 '25 10:03 pinin4fjords

Thanks @suzannejin - another big one. Will review thoroughly when I can, just give me some time :-)

Unfortunately many parts of the pipeline had to be changed to properly use the toolsheets args, otherwise rare behaviours raise...

However, if it makes the review process easier for you, I can split an subPR with the changes related to params.differential_method and params.functional_method, which are more independent to the rest.

suzannejin avatar Mar 20 '25 10:03 suzannejin

Thanks @suzannejin - another big one. Will review thoroughly when I can, just give me some time :-)

Unfortunately many parts of the pipeline had to be changed to properly use the toolsheets args, otherwise rare behaviours raise...

However, if it makes the review process easier for you, I can split an subPR with the changes related to params.differential_method and params.functional_method, which are more independent to the rest.

That's OK, I just need some dedicated time with it to understand and look for simplifications, as in the previous PRs.

pinin4fjords avatar Mar 20 '25 13:03 pinin4fjords

Just tackled the utils subworkflow so far, we need to find some way to make the logic easier to follow. I've made a suggestion for one possibility, using functions - it 'runs', but you may want to tweak the logic slightly.

Thank you for working on a clearer version of the tool parsing code @pinin4fjords. I have made the necessary changes, and it looks nice to me. Let me know if you need extra changes.

A question though, do you know why GEOQUERY_GETGEO fail downloading data in the CI sometimes? https://github.com/nf-core/differentialabundance/actions/runs/14104676287/job/39508400129?pr=443

suzannejin avatar Mar 27 '25 13:03 suzannejin

Just tackled the utils subworkflow so far, we need to find some way to make the logic easier to follow. I've made a suggestion for one possibility, using functions - it 'runs', but you may want to tweak the logic slightly.

Thank you for working on a clearer version of the tool parsing code @pinin4fjords. I have made the necessary changes, and it looks nice to me. Let me know if you need extra changes.

OK good.

I'm seeing what I can do in the main workflow, bear with me as I think about this. I'll submit the review when I'm sure of things.

A question though, do you know why GEOQUERY_GETGEO fail downloading data in the CI sometimes? https://github.com/nf-core/differentialabundance/actions/runs/14104676287/job/39508400129?pr=443

No, probably just a flakey web service

pinin4fjords avatar Mar 27 '25 14:03 pinin4fjords

With the toolsheet implemented, can we reduce the number of input parameters because they are redundant now? E.g., do we still need

  • study_type

  • study_abundance_type

  • immunedeconv_run

Also, with respect to https://github.com/nf-core/differentialabundance/issues/367 I was wondering if we should already plan for how to run multiple functional_methods in one pipeline run? E.g. for a typical rnaseq analysis I would like to run multiple combinations of methods and gene sets (for instance, decoupler:progeny (pathway) + decoupler:collecTRI (transcription factors) + immunedeconv:default_signatures (cell-types) + ...)

Hello @grst, indeed that is the idea! Many redundant parameters should be removed with the usage of the toolsheet. Also, the functional subworkflow is already implemented to allow parallel running of multiple tools, and hence also for the case you mention, once you add those methods.

suzannejin avatar Apr 02 '25 08:04 suzannejin

I've been thinking deeply about the current implementation and have a proposal to significantly simplify our approach while maintaining functionality.

Core Proposal: Simplify the Toolsheet Concept

The current complexity stems from treating toolsheet rows as a new conceptual entity, distinct from pipeline parameters. This requires additional validation, parsing, and parameter merging logic. I propose we simplify this by making the toolsheet directly represent pipeline parameters.

Key Changes:

  1. Toolsheet Structure

    • Each row represents a complete set of pipeline parameters

    • Remove specialized columns like diff_args, func_method

    • Use column names that map directly to workflow parameters

    • Single toolsheet can handle all cases since study_type is a workflow parameter

  2. Validation Simplification

    • Use the existing pipeline parameter schema to validate toolsheet rows

    • Eliminates duplicate validation logic

    • Leverages our existing parameter validation infrastructure

Technical Approach:

  1. Parameter Handling

    • Create a schema for toolsheet validation by extracting relevant fields from pipeline schema

    • Validate each toolsheet row against this schema

    • For each valid toolsheet row:

      • Merge with base pipeline parameters (toolsheet values override pipeline params)
    • Result is an array of complete parameter sets, each representing a different analysis configuration

  2. Workflow Changes

    • Replace direct params usage with tool meta attributes throughout the inner workflow

    • All conditional logic (if params.x.. will instead be branch based on meta attributes

    • Each analysis configuration flows through the workflow as a separate meta object

    • Enables parallel execution of different parameter configurations

    • We maintain e.g. analysis_name all the way through, allowing us to re-associated files at will for reporting.

  3. Schema Reuse

    • Transform pipeline schema into toolsheet schema

    • Keep only properties that match toolsheet headers

    • Maintain validation rules from original schema

    • Add metadata to track parameter origins

Implementation Note

While this approach will require more upfront re-architecting of the workflow, the end result will be significantly simpler and more maintainable. The initial investment in refactoring will pay off in cleaner, more straightforward execution logic.

Benefits:

  • More maintainable codebase

  • Reduced complexity

  • Consistent parameter handling

  • Single source of truth for parameter validation

  • Clear parameter inheritance path

  • Parallel execution of different analysis configurations

I've started implementing this approach in PR #448, have got as far as making it so that every channel element of PIPELINE_INITIALISATION.out.tools is a complete set of parameters, validated from the elements of the pipeline schema. Feel free to take a look, I'll work to test out the main workflow implications over the next couple of days.

Hello @pinin4fjords , thank you for the proposal. I am on holidays these days so I was not able to check your new code, but as soon i'm back on Friday I will check it!

suzannejin avatar Apr 02 '25 08:04 suzannejin

No hurry @suzannejin , I'm still working on it, enjoy the hols!

pinin4fjords avatar Apr 02 '25 09:04 pinin4fjords

Hello @pinin4fjords, as for the discussion we had about deduplication and avoiding modules to run again unnecessarily, it seems to me that this will not only happen for a differential analysis module, but for ANY module in the pipeline, as now in the new implementation the meta will include all the params and the toolsheet rows can theoretically contain any parameter (not only constrained to differential and functional related ones).

This means that to avoid recomputation, the meta should be reparsed at EACH module (instead of only the differential/functional pairs) to contain only the relevant parameters: this means the parameters that are gonna be used by the given module + all the parameters that were used in previous modules to generate the given input.

Broadly speaking I think a possible solution could be:

  1. Define base params at the beginning of the workflow, while keeping analysis_name served as key for meta expansion
  2. Before each module, extract the corresponding relevant parameters, and add them.

However, this might break again the flow of join/branch/map that is now cleanly implemented at each stage from the very beginning of the workflow. So I think if we go this way, it would require me quite a bit of time to fix it. If so I would propose to do the deduplication in the next PRs. (Anyway, currently we only run one row per time, so it would not affect)

Alternatively, we could keep the current no special column toolsheet phylosophy, but only allowing to put differential/functional related parameters into the toolsheet.

Of course, let me know if you have alternative ideas and I would be happy to brainstorm :)

Also CC @mirpedrol and @JoseEspinosa to see if they have other suggestions.

suzannejin avatar Apr 09 '25 15:04 suzannejin

@suzannejin I think we will need to collapse-and-expand wherever we want to explicitly avoid duplicated computation and then restore the link of a result to a particular analysis.

I ran out of time before I implemented it for more than the differential example, but I think we should definitely do it for the base set of params, for the steps before differential (validation etc).

I don't think it's necessary to actually limit what params you can use though. Nothing will break, there might just be some repetition.

I considered the option of keeping analysis_name only in the tuples and dynamically adding params, but decided it would mean a lot of repetition to set the meta each time, before the grouping etc. But happy to review if you'd like to experiment.

The no-special-column approach is definitely the way to go I think. It makes the iteration generic, and we could use it for all kinds of things in future.

pinin4fjords avatar Apr 09 '25 16:04 pinin4fjords

@suzannejin I think we will need to collapse-and-expand wherever we want to explicitly avoid duplicated computation and then restore the link of a result to a particular analysis.

I ran out of time before I implemented it for more than the differential example, but I think we should definitely do it for the base set of params, for the steps before differential (validation etc).

I don’t think it’s necessary to actually limit what params you can use though. Nothing will break, there might just be some repetition.

I considered the option of keeping analysis_name only in the tuples and dynamically adding params, but decided it would mean a lot of repetition to set the meta each time, before the grouping etc. But happy to review if you’d like to experiment.

The no-special-column approach is definitely the way to go I think. It makes the iteration generic, and we could use it for all kinds of things in future.

So, we would need to do collapse-and-expand to provide only the relevant params for every single module. Otherwise resume would fail (you could imagine a single parameter changed and the entire pipeline cannot be resumed).

suzannejin avatar Apr 11 '25 09:04 suzannejin

My feeling is that, in the POC as-is, you can already iterate on absolutely any param, which is pretty nice. The cost is that there is some repeated computation, but tbh, nothing this workflow does is very computationally intensive so that's not a huge deal for the vast majority of use cases, so that's already a good baseline.

We have already set things up so that people with identical configuration up to the differential workflow will not repeat steps. I think if we added just one further optimisation for the 'base params', so that e.g. someone doing different differential analyses on the same workflow inputs wouldn't repeat the validation, that will be sufficient for the moment. If any others become important to folks, they can be added progressively later.

We could do the collapse-and-expand on every module, but it does add a bit of noise to the code, so it makes sense to hold off until we actually have demand for that.

pinin4fjords avatar Apr 11 '25 10:04 pinin4fjords

Hello! I had a first look at the suggestion of providing all parameters through the toolsheet. While I am not completely against it, I am a bit worried that it introduces a big divergence from other nf-core pipelines, this will make it difficult for other people to contribute, and also for PR reviews. Another concert that I have is that it doesn't make sense for all parameters to be provided through a toolsheet, since they must remain the same during the whole nextflow run, I am wondering how will you deal with a parameter which has different values depending on the row and if this will make things confusing for users.

Hello! As for the parameters that must be consistent through the entire run and not changed in toolsheet (eg. outdir), we could explicitly create a file in assets with the list of those params. Then the parsing of the toolsheet should throw an error if they are included.

But yes, I understand your concern about the divergence from the usual nf-core standards. While the previous multi-tool implementation was great to allow iteration over tools of interest, the multi-config implementation opens broader possibilities.

Now, if the toolsheet can only contain the tool-specific params, while excluding the rest (eg. outdir, email, genome, input data, observations_, features_, etc), then essentially we end up with a toolsheet with a broader scope of tools than the previous differential/functional implementation, but not a full multi-config behavior. We could go for this option.

Let's see what is your opinion @pinin4fjords

suzannejin avatar Apr 11 '25 15:04 suzannejin

We could do the collapse-and-expand on every module, but it does add a bit of noise to the code, so it makes sense to hold off until we actually have demand for that.

Actually I managed to do collapse-expand as you were doing for all the modules, without noising the code much. Just simply calling a collapse meta function before each module and expand meta function after each module. The codebase stays pretty clean. I will commit it on Monday so you can check it

suzannejin avatar Apr 11 '25 15:04 suzannejin

Hello! I had a first look at the suggestion of providing all parameters through the toolsheet. While I am not completely against it, I am a bit worried that it introduces a big divergence from other nf-core pipelines, this will make it difficult for other people to contribute, and also for PR reviews. Another concert that I have is that it doesn't make sense for all parameters to be provided through a toolsheet, since they must remain the same during the whole nextflow run, I am wondering how will you deal with a parameter which has different values depending on the row and if this will make things confusing for users.

Hello! As for the parameters that must be consistent through the entire run and not changed in toolsheet (eg. outdir), we could explicitly create a file in assets with the list of those params. Then the parsing of the toolsheet should throw an error if they are included.

But yes, I understand your concern about the divergence from the usual nf-core standards. While the previous multi-tool implementation was great to allow iteration over tools of interest, the multi-config implementation opens broader possibilities.

Now, if the toolsheet can only contain the tool-specific params, while excluding the rest (eg. outdir, email, genome, input data, observations_, features_, etc), then essentially we end up with a toolsheet with a broader scope of tools than the previous differential/functional implementation, but not a full multi-config behavior. We could go for this option.

Let's see what is your opinion @pinin4fjords

Which parameters do you think MUST stay the same throughout? I can't think of any off the top of my head. We could even iterate the input matrices etc if we chose to do so. We can use validation logic to restrict things where necessary.

As for nf-core standards, you may be somewhat correct, but the original toolsheet proposal is not much different conceptually- each row is really just a set of parameters, except restricted to a narrow set you 'bless'. You had not just tools, but all the associated parameters etc, which actually make up a significant proportion of the workflow's parameters. The major difference was encoding: you had differential parameters as--param foo in special new columns. All I've done is simplify that to refer to the workflow params directly (rather than requiring the CLI encoding), and widened the set of params allowed.

I totally get the point about making it difficult to contribute, I have had that concern in my head throughout this endeavour which is why I've been so keen to simplify wherever possible. But if that's a blocker for one of these solutions, then IMO it's a blocker for both of them.

On the other hand, if we are going to do the iteration, I would rather it was implemented in a way that had the widest usefulness.

Of course a far better solution to all of this would be if Nextflow workflows were iterable themselves, but they're not, so here we are!

pinin4fjords avatar Apr 14 '25 08:04 pinin4fjords

Which parameters do you think MUST stay the same throughout? I can't think of any off the top of my head. We could even iterate the input matrices etc if we chose to do so. We can use validation logic to restrict things where necessary.

For instance, all the output parameters, like the outdir, anything describing the report, etc. If you allow changes on these, what would be the difference between providing a new row in the config or running a different command of nextflow? This can work with the new config, but I think it's more about concept than functionality.

As for nf-core standards, you may be somewhat correct, but the original toolsheet proposal is not much different conceptually- each row is really just a set of parameters, except restricted to a narrow set you 'bless'. You had not just tools, but all the associated parameters etc, which actually make up a significant proportion of the workflow's parameters. The major difference was encoding: you had differential parameters as--param foo in special new columns. All I've done is simplify that to refer to the workflow params directly (rather than requiring the CLI encoding), and widened the set of params allowed.

The original toolsheet proposal looks more simple to me, but this could very well be because I am more used to it. Since it allows providing the parameters of specific tools only, and it's only related to the tools, not any other param that affects the whole workflow or pipeline run, it's more contained/specific.
Your suggestion of referring to the params name from he toolsheet columns makes the code cleaner, indeed. I think this is a good improvement. Do you think from a user point of view it would be easier to provide each parameter in a separate column than all of them in the same columns? I know in other pipelines where the number of parameters is too high, they allow providing a string which contains all possible parameters for one tool. This is a pipeline dev decision, and it was done with separate parameters in this pipeline before, so I am just mentioning it as an idea, not suggesting a whole change of the pipeline params now 🙂

mirpedrol avatar Apr 14 '25 09:04 mirpedrol

Do you think from a user point of view it would be easier to provide each parameter in a separate column than all of them in the same columns? I know in other pipelines where the number of parameters is too high, they allow providing a string which contains all possible parameters for one tool.

The toolsheet can also be a YAML file (because nf-validation supports YAML and CSV out-of-the-box). I think this can be cleaner in some cases, as columns that are not used by one specific method do not need to be specified. But as both is supported, this can be a user choice.

grst avatar Apr 14 '25 09:04 grst

Which parameters do you think MUST stay the same throughout? I can't think of any off the top of my head. We could even iterate the input matrices etc if we chose to do so. We can use validation logic to restrict things where necessary.

For instance, all the output parameters, like the outdir, anything describing the report, etc. If you allow changes on these, what would be the difference between providing a new row in the config or running a different command of nextflow? This can work with the new config, but I think it's more about concept than functionality.

Actually, I think that, right now, there should be one report per 'row'. The report HTML is already too big, it can't take multiple iterations. I also think each iteration could have its own output (sub)dir.

And I think all of this IS equivalent to multiple Nextflow runs, that's kind of my point. If I could have figured a way of running workflows iterated over channels I've have been suggesting that :-).

As for nf-core standards, you may be somewhat correct, but the original toolsheet proposal is not much different conceptually- each row is really just a set of parameters, except restricted to a narrow set you 'bless'. You had not just tools, but all the associated parameters etc, which actually make up a significant proportion of the workflow's parameters. The major difference was encoding: you had differential parameters as--param foo in special new columns. All I've done is simplify that to refer to the workflow params directly (rather than requiring the CLI encoding), and widened the set of params allowed.

The original toolsheet proposal looks more simple to me, but this could very well be because I am more used to it. Since it allows providing the parameters of specific tools only, and it's only related to the tools, not any other param that affects the whole workflow or pipeline run, it's more contained/specific. Your suggestion of referring to the params name from he toolsheet columns makes the code cleaner, indeed. I think this is a good improvement. Do you think from a user point of view it would be easier to provide each parameter in a separate column than all of them in the same columns?

100%. If I'm a user used to working with workflow params, I can transition quickly to providing them as columns in the 'toolsheet' (<< though this might need to be renamed!). A whole new syntax (differential_params or whatever) is just a new barrier.

I know in other pipelines where the number of parameters is too high, they allow providing a string which contains all possible parameters for one tool. This is a pipeline dev decision, and it was done with separate parameters in this pipeline before, so I am just mentioning it as an idea, not suggesting a whole change of the pipeline params now 🙂

Yes, we intentionally went this way, with lots of params. I like it because you can document options better.

pinin4fjords avatar Apr 14 '25 09:04 pinin4fjords

Do you think from a user point of view it would be easier to provide each parameter in a separate column than all of them in the same columns? I know in other pipelines where the number of parameters is too high, they allow providing a string which contains all possible parameters for one tool.

The toolsheet can also be a YAML file (because nf-validation supports YAML and CSV out-of-the-box). I think this can be cleaner in some cases, as columns that are not used by one specific method do not need to be specified. But as both is supported, this can be a user choice.

Yes, the YAML thought occurred to me too, could be nicer than sparse CSV files.

pinin4fjords avatar Apr 14 '25 09:04 pinin4fjords

We could do the collapse-and-expand on every module, but it does add a bit of noise to the code, so it makes sense to hold off until we actually have demand for that.

Hello!! So here you have the code running without problems with resume. It basically follows your POC for collapse-and-expand, but applied to each module. I believe it is still pretty clean. I will add back immunedev and reports soonish, and solve the conflicts. Hope with that we could soon close this PR!

suzannejin avatar Apr 14 '25 14:04 suzannejin

Do you think from a user point of view it would be easier to provide each parameter in a separate column than all of them in the same columns? I know in other pipelines where the number of parameters is too high, they allow providing a string which contains all possible parameters for one tool.

The toolsheet can also be a YAML file (because nf-validation supports YAML and CSV out-of-the-box). I think this can be cleaner in some cases, as columns that are not used by one specific method do not need to be specified. But as both is supported, this can be a user choice.

yes! we could add that feature soonish in the next PRs.

suzannejin avatar Apr 14 '25 14:04 suzannejin

Which parameters do you think MUST stay the same throughout? I can't think of any off the top of my head. We could even iterate the input matrices etc if we chose to do so. We can use validation logic to restrict things where necessary.

For instance, all the output parameters, like the outdir, anything describing the report, etc. If you allow changes on these, what would be the difference between providing a new row in the config or running a different command of nextflow? This can work with the new config, but I think it's more about concept than functionality.

Actually, I think that, right now, there should be one report per 'row'. The report HTML is already too big, it can't take multiple iterations. I also think each iteration could have its own output (sub)dir.

I think each iteration should produce its own report file. I also think each iteration should have its own subdir, but a nextflow run should have one output dir.

The original toolsheet proposal looks more simple to me, but this could very well be because I am more used to it. Since it allows providing the parameters of specific tools only, and it's only related to the tools, not any other param that affects the whole workflow or pipeline run, it's more contained/specific. Your suggestion of referring to the params name from he toolsheet columns makes the code cleaner, indeed. I think this is a good improvement. Do you think from a user point of view it would be easier to provide each parameter in a separate column than all of them in the same columns?

100%. If I'm a user used to working with workflow params, I can transition quickly to providing them as columns in the 'toolsheet' (<< though this might need to be renamed!). A whole new syntax (differential_params or whatever) is just a new barrier.

I personally like the toolsheet columns match with workflow params. What I am still not sure if, as I mentioned before, allow every param to be iterable through toolsheet. Maybe I am also biased by tool-mentality, I do see it makes a lot of sense to allow users to iterate over different tool params, but I don't think it make sense to iterate over different outdir for example (we should make use of proper subdirs for different iterations).

Of course, these are details. As far as we agree on the main ideas, these could be progressively improved with later PRs.

suzannejin avatar Apr 14 '25 14:04 suzannejin

I personally like the toolsheet columns match with workflow params. What I am still not sure if, as I mentioned before, allow every param to be iterable through toolsheet.

My question is more "why should we prevent a user from doing so"?

pinin4fjords avatar Apr 15 '25 08:04 pinin4fjords

Just some rambling. We were talking with @suzannejin and wanted to suggest an idea and write it down to organise our minds.

The current implementation:

  1. Convert nextflow_schema.json into toolsheet_schema.json (only adding columns present in toolsheet.csv) and full_toolsheet_schema.json (all params)
  2. Extract params from toolsheet.csv with samplesheetToList() and toolsheet_schema.json + validation
  3. Add missing params from pipeline params to toolsheet map -> to json file full_toolsheet.json
  4. Use samplesheetToList() with full_toolsheet.json and full_toolsheet_schema.json

Idea/suggestion: Create a new nf-schema function called validateMap(). This function can receive a map containing all params and a JSON schema. In our example, the toolsheet map and nextflow_schema.json The new steps would be:

  1. Copy pipeline params to paramsMap, update with values form toolsheet.csv
  2. use validateMap() with paramsMap and nextflow_schema.json

mirpedrol avatar Apr 15 '25 09:04 mirpedrol

Just some rambling. We were talking with @suzannejin and wanted to suggest an idea and write it down to organise our minds.

The current implementation:

  1. Convert nextflow_schema.json into toolsheet_schema.json (only adding columns present in toolsheet.csv) and full_toolsheet_schema.json (all params)
  2. Extract params from toolsheet.csv with samplesheetToList() and toolsheet_schema.json + validation
  3. Add missing params from pipeline params to toolsheet map -> to json file full_toolsheet.json
  4. Use samplesheetToList() with full_toolsheet.json and full_toolsheet_schema.json

Idea/suggestion: Create a new nf-schema function called validateMap(). This function can receive a map containing all params and a JSON schema. In our example, the toolsheet map and nextflow_schema.json The new steps would be:

  1. Copy pipeline params to paramsMap, update with values form toolsheet.csv
  2. use validateMap() with paramsMap and nextflow_schema.json

This sounds excellent. I just lacked the nf-schema mad skillz to accomplish such a thing.

pinin4fjords avatar Apr 15 '25 10:04 pinin4fjords

[!WARNING] Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.2.0. Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

nf-core-bot avatar Apr 30 '25 12:04 nf-core-bot

hello @pinin4fjords, I applied some last modifications and also most of your suggestions. There are only two last problems remain:

  • Parsing toolsheet relies on 1) first validateParams on params, and 2) use samplesheetToList on the toolsheet with a partial schema derived from nextflow_schema.json. This not ideal, as the params and toolsheet params are checked independently, hence some required params might go missing without noticing, etc. For this @mirpedrol offered to implement a validateMap() function in nf-schema.
  • getRelevantParams currently gets all the base params + the ones starting with the module specific prefix (eg. differential_, filtering_). While I think this serves as a very good baseline, you suggested an alternative implementation that relies on the nextflow schema to classify the params into specific blocks.

IMO, these two problems are very contained into specific functions in the utils subworkflow, meaning that they can be easily changed without touching the rest of the pipeline in the next PRs, I would suggest to first merge this PR that serves as good baseline with all the toolsheet-related features ready. What do you think? Would you mind to review again and we move forward?

suzannejin avatar Apr 30 '25 15:04 suzannejin

Thanks @suzannejin , will review again as soon as I can. Bear with me, it's a busy time for us.

pinin4fjords avatar May 01 '25 08:05 pinin4fjords

Hi @pinin4fjords, I modified the code to use category based params in the simplified meta. Let me know if it looks better for you now :)

suzannejin avatar May 12 '25 12:05 suzannejin