demultiplex skip over empty fastqs after demultiplex and continue gracefully

PR checklist

[x] This comment contains a description of changes (with reason).
[ ] If you've fixed a bug or added code that should be tested, add tests!
[ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
[ ] If necessary, also make a PR on the nf-core/demultiplex branch on the nf-core/test-datasets repository.
[x] Make sure your code lints (nf-core lint).
[x] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
[ ] Usage Documentation in docs/usage.md is updated.
[ ] Output Documentation in docs/output.md is updated.
[x] CHANGELOG.md is updated.
[ ] README.md is updated (including new tool citations and authors/contributors).

Title of PR:

"Enhanced Resilience to Empty or Invalid FASTQ Files with Logging Functionality"

Description:

Changes Made:

Introduction of a Log File:
- Implemented a global log file (${params.outdir}/invalid_fastqs.log) for tracking empty or invalid FASTQ files, enhancing post-run analysis.
Modified generate_fastq_meta Function:
- Updated generate_fastq_meta to gracefully handle empty or invalid FASTQ files, preventing pipeline interruption.
- Added logFile as a new parameter for logging these occurrences.
New Function - appendToLogFile:
- Introduced appendToLogFile, a utility function for appending messages to the log file. This function improves maintainability and includes a check to handle Groovy GStrings.

Reason for Changes:

Improving Pipeline Robustness: Aims to bolster the pipeline's resilience against edge cases like empty or invalid FASTQ files.
Enhanced Debugging and Transparency: Offers a clear record of skipped files, aiding in debugging and ensuring data quality.
Maintainability and Code Organization: The addition of appendToLogFile and the parameterization of logFile align with best coding practices, enhancing code reusability and organization.

Impact:

These enhancements bolster the pipeline's robustness without altering its core functionality, ensuring a smooth experience when encountering empty or invalid FASTQ files.

Use Cases Addressed:

Incorrect Barcode in Sample Sheet:
- In cases where an incorrect set of barcodes is provided for a sample, resulting in an empty FASTQ file, the pipeline previously halted. This was problematic when processing multiple samples (e.g., 99 valid samples and 1 with an incorrect barcode), as valid FASTQ files would remain in the workdir and not be moved to the outdir. With the proposed changes, the pipeline will continue to the end, allowing for the release of data for valid samples and logging the problematic sample for further investigation.
Library Preparation Errors:
- Instances where library preparation for a specific sample is faulty can also lead to empty FASTQ files. The updated pipeline will now allow for continuous processing to the end, enabling users to utilize MultiQC outputs to identify which samples were problematic. The invalid_fastqs.log file will assist in pinpointing these samples for reprocessing or further laboratory investigation.

Jan 12 '24 20:01 glichtenstein

`nf-core lint` overall result: Failed :x:

Posted for pipeline commit 6c63769

+| ✅ 157 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗   4 tests had warnings |!
-| ❌  10 tests failed       |-

:x: Test failures:

files_exist - File must be removed: lib/nfcore_external_java_deps.jar
files_unchanged - .github/workflows/branch.yml does not match the template
files_unchanged - .github/workflows/linting_comment.yml does not match the template
files_unchanged - .github/workflows/linting.yml does not match the template
files_unchanged - assets/email_template.html does not match the template
files_unchanged - assets/email_template.txt does not match the template
files_unchanged - assets/nf-core-demultiplex_logo_light.png does not match the template
files_unchanged - docs/images/nf-core-demultiplex_logo_light.png does not match the template
files_unchanged - docs/images/nf-core-demultiplex_logo_dark.png does not match the template
files_unchanged - pyproject.toml does not match the template

:heavy_exclamation_mark: Test warnings:

pipeline_todos - TODO string in README.md: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in WorkflowDemultiplex.groovy: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required

:grey_question: Tests ignored:

files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
actions_ci - actions_ci

:white_check_mark: Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-demultiplex_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-demultiplex_logo_light.png
files_exist - File found: docs/images/nf-core-demultiplex_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: lib/WorkflowDemultiplex.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-demultiplex_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 1.5.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.trim_fastq= true
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/NfcoreTemplate.groovy matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (160 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: awstest.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' contains report_section_order
multiqc_config - 'assets/multiqc_config.yml' contains export_plots
multiqc_config - 'assets/multiqc_config.yml' contains report_comment
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.12.1
Run at 2024-02-02 18:54:59

Jan 12 '24 20:01 github-actions[bot]

Question for Reviewers about setting up a test for the PR: In our in-house testing using BCLConvert, we have a SampleSheet.csv where one of the samples has wrong barcodes assigned (both Index and Index2). This leads to the generation of empty FASTQ files for that specific sample, which is one of the scenarios this PR aim to address.

However, for the nf-core GitHub tests, we are currently using this sample sheet for testing demultiplex: flowcell_input.csv. This sample sheet doesn't include any incorrect indices that would result in empty FASTQ files.

My question is: Would it be acceptable to modify this standard sample sheet so that it includes a sample (sample2) with incorrect indices? This modification would enable us to test the pipeline's ability to handle empty FASTQ files in a controlled and predictable manner, which is crucial for validating the new features.

Perhaps, we could also modify one of these with a s25 and add some random barcodes that will yield empty s25 fastqs:

Your guidance on this matter would be greatly appreciated, as it will help ensure that our tests are comprehensive and reflective of real-world scenarios where such issues might arise.

Jan 24 '24 21:01 glichtenstein

Thanks for the PR!

Gonna wait for @matthdsm on this one.

In the meantime could you squash some of the commits? There's a lot of them that are linting or debugging, or if you'd like we can just squash the whole PR, there's not many changes made. 89 commits for 96 lines of code just seems like a lot.

Jan 31 '24 17:01 edmundmiller

@glichtenstein about the test data, yes I think it would make a lot of sense to have a samplesheet with faulty barcodes. Maybe it would be best to just create a new samplesheet so we have two test cases, one for a normal run and one with faulty barcodes.

I've been working on setting up a test dataset from a NovaSeq6000, would be nice if we could have second samplesheet there too (see https://github.com/nf-core/test-datasets/blob/demultiplex/testdata/NovaSeq6000)

Feb 02 '24 09:02 Aratz

Thanks for the PR!

Gonna wait for @matthdsm on this one.

In the meantime could you squash some of the commits? There's a lot of them that are linting or debugging, or if you'd like we can just squash the whole PR, there's not many changes made. 89 commits for 96 lines of code just seems like a lot.

Thanks @edmundmiller ! Very good idea, I was not aware that one could squash commits, I will proceed with an interactive rebase, thus squashing all the commits into one that is more descriptive. Thanks for the heads-up.

Feb 02 '24 14:02 glichtenstein

LGTM!

Thanks for the review @matthdsm !

Feb 02 '24 14:02 glichtenstein

@glichtenstein about the test data, yes I think it would make a lot of sense to have a samplesheet with faulty barcodes. Maybe it would be best to just create a new samplesheet so we have two test cases, one for a normal run and one with faulty barcodes.

I've been working on setting up a test dataset from a NovaSeq6000, would be nice if we could have second samplesheet there too (see https://github.com/nf-core/test-datasets/blob/demultiplex/testdata/NovaSeq6000)

@Aratz Thanks for the reviews! Yes, that is a very good idea, I will get this dataset, and do some local testing with faulty barcodes, then I could open a branch for the NovaSeq6000 SampleSheet.csv. Do you rather have two SampleSheets? like a SampleSheet_faulty.csv? or would you prefer that I add a new sample row for the current SampleSheet.csv?

Feb 02 '24 14:02 glichtenstein

I think it's great to be able to complete the pipeline even with faulty barcodes! However, sometimes we might want the pipeline to crash to avoid risking this issue going under the radar. Maybe we could have an extra parameter e.g. skip_empty_fastq ? (could be true by default).

This is a very good suggestion, we thought of it but it was going to require more business logic, but definitely it will be a good addition. I will explore options and come back to you. In this regards, shall I merge this PR into dev and then open a new branch for it, or would you suggest I keep working on the '--skip_empty_fastq' in the same PR?

Also, if I'm not mistaken subworkflows/nf-core/bcl_demultiplex/main.nf is automatically imported from https://github.com/nf-core/modules, so your PR should be made against that repository first.

Oh, I was not aware of that relation, I will take a look, and make the PR againt https://github.com/nf-core/modules if possible.
Thanks a lot.

Feb 02 '24 14:02 glichtenstein

@Aratz I have opened a PR in modules 🤞

Feb 02 '24 21:02 glichtenstein

Thanks for the PR!

Gonna wait for @matthdsm on this one.

In the meantime could you squash some of the commits? There's a lot of them that are linting or debugging, or if you'd like we can just squash the whole PR, there's not many changes made. 89 commits for 96 lines of code just seems like a lot.

Hi @edmundmiller, after the rebase to squash all the commits, I do not pass the nf-core inting, do you think something has changed with the tests?

Feb 02 '24 21:02 glichtenstein

@Aratz Thanks for the reviews! Yes, that is a very good idea, I will get this dataset, and do some local testing with faulty barcodes, then I could open a branch for the NovaSeq6000 SampleSheet.csv. Do you rather have two SampleSheets? like a SampleSheet_faulty.csv? or would you prefer that I add a new sample row for the current SampleSheet.csv?

Yes I think it would be best with two samplesheets (there is one in the .tar.gz file so you might need to upload a new one too. I thinking both datasets can sit in the Novaseq6000 folder, and you can document why they differ in the README file.

This is a very good suggestion, we thought of it but it was going to require more business logic, but definitely it will be a good addition. I will explore options and come back to you. In this regards, shall I merge this PR into dev and then open a new branch for it, or would you suggest I keep working on the '--skip_empty_fastq' in the same PR?

I'd say you can close this one and open a new one with the automatic module update and work on the new parameter there :+1:

Oh, I was not aware of that relation, I will take a look, and make the PR againt https://github.com/nf-core/modules if possible. Thanks a lot.

Great! I'll check it out!

Feb 05 '24 07:02 Aratz

Closing this PR, it was opened on nf-core/modules instead: https://github.com/nf-core/modules/pull/4842

May 08 '24 16:05 glichtenstein

skip over empty fastqs after demultiplex and continue gracefully

PR checklist

Title of PR:

Description:

Reason for Changes:

Impact:

Use Cases Addressed:

nf-core lint overall result: Failed :x:

:x: Test failures:

:heavy_exclamation_mark: Test warnings:

:grey_question: Tests ignored:

:white_check_mark: Tests passed:

Run details

`nf-core lint` overall result: Failed :x: