sarek
sarek copied to clipboard
Check if flowcell id matches for paired samples
I noticed this comment about checking the flowcell ID for paired samples while constructing GATK read groups. I was adapting the read group code for a custom pipeline and attempted a quick fix, so I thought I'd contribute it back to sarek.
While constructing the read group from paired fastq samples, perform a check to ensure that the id is the same for (the first reads) in fastq_1 and fastq_2. Exit out with an error otherwise and report the problematic sample and file paths.
Incidentally, while researching read groups I came across the following recommendations: https://support.sentieon.com/appnotes/read_groups/. Would it be worth updating some of the fields to match these guidelines?
PR checklist
- [x] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- => Only tested this manually, but happy to add a proper test if you could give me a starting point. Is there already an existing test for samplesheet validation that I can add this too? I guess I will need to add "corrupt" fastq files to the nf-core test repo?
- [ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
- [ ] If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
- [x] Make sure your code lints (
nf-core lint
). - [x] Ensure the test suite passes (
nextflow run . -profile test,docker --outdir <OUTDIR>
). - [x] Check for unexpected warnings in debug mode (
nextflow run . -profile debug,test,docker --outdir <OUTDIR>
). - [ ] Usage Documentation in
docs/usage.md
is updated. - [ ] Output Documentation in
docs/output.md
is updated. - [ ]
CHANGELOG.md
is updated.- => will do this after submitting the PR so that I can link to it.
- [ ]
README.md
is updated (including new tool citations and authors/contributors).- => should I do this even for such a minor contribution?