sarek icon indicating copy to clipboard operation
sarek copied to clipboard

Check if flowcell id matches for paired samples

Open pmoris opened this issue 1 year ago • 6 comments

I noticed this comment about checking the flowcell ID for paired samples while constructing GATK read groups. I was adapting the read group code for a custom pipeline and attempted a quick fix, so I thought I'd contribute it back to sarek.

While constructing the read group from paired fastq samples, perform a check to ensure that the id is the same for (the first reads) in fastq_1 and fastq_2. Exit out with an error otherwise and report the problematic sample and file paths.

Incidentally, while researching read groups I came across the following recommendations: https://support.sentieon.com/appnotes/read_groups/. Would it be worth updating some of the fields to match these guidelines?

PR checklist

  • [x] This comment contains a description of changes (with reason).
  • [ ] If you've fixed a bug or added code that should be tested, add tests!
    • => Only tested this manually, but happy to add a proper test if you could give me a starting point. Is there already an existing test for samplesheet validation that I can add this too? I guess I will need to add "corrupt" fastq files to the nf-core test repo?
  • [ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • [ ] If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • [x] Make sure your code lints (nf-core lint).
  • [x] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • [x] Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • [ ] Usage Documentation in docs/usage.md is updated.
  • [ ] Output Documentation in docs/output.md is updated.
  • [ ] CHANGELOG.md is updated.
    • => will do this after submitting the PR so that I can link to it.
  • [ ] README.md is updated (including new tool citations and authors/contributors).
    • => should I do this even for such a minor contribution?

pmoris avatar Sep 26 '24 12:09 pmoris

nf-core pipelines lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 82615ad

+| ✅ 215 tests passed       |+
#| ❔  11 tests were ignored |#
!| ❗   4 tests had warnings |!

:heavy_exclamation_mark: Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

:grey_question: Tests ignored:

:white_check_mark: Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-10-30 09:16:39

github-actions[bot] avatar Sep 26 '24 12:09 github-actions[bot]

@nf-core-bot fix linting :pray: pretty please :pray:

maxulysse avatar Oct 04 '24 11:10 maxulysse

@pmoris I updated your PR with the latest update in this function. No need to check for paired samples as sarek only handles paired samples

maxulysse avatar Oct 04 '24 11:10 maxulysse

Can you update the CHANGELOG

maxulysse avatar Oct 04 '24 11:10 maxulysse

Changelog is updated!

I also fixed the conditional (by removing the meta.single_end check, it accidentally moved the negation to the flowcell variable, causing the check to not trigger at the right time).

Lastly, what are your thoughts on updating the PU field to flowcell.lane rather than just lane (as recommended here: https://support.sentieon.com/appnotes/read_groups/)?

pmoris avatar Oct 08 '24 14:10 pmoris

Why is the linter complaining? There is no trailing whitespace or non-multiple-of-4 padding as far as I can tell...

pmoris avatar Oct 10 '24 10:10 pmoris