sarek icon indicating copy to clipboard operation
sarek copied to clipboard

WIP: Adding support for fastq.gz.spring-files as input

Open asp8200 opened this issue 1 year ago • 2 comments

In this PR, I'm trying to add support for fastq.gz.spring-files as input. (This PR replaces the former https://github.com/nf-core/sarek/pull/1522 )

As agreed with @maxulysse, I extended the sample-sheet schema with spring_1 and spring_2.

The spring-files can be one files containing both R1 and R2 (goes in spring_1 while spring_2 is left undefined or empty) or two files - one with R1 (goes in spring_1). and one with R2 (goes in spring_2).

Disclaimer: Excuse the poorly named variables and modules "instances". Please suggest better names ;-)

TO-DO:

  • [ ] - Streamline code
  • [ ] - Add docs
  • [x] - Add test-data
  • [ ] - Add test
  • [ ] - Update changelog

PR checklist

  • [ ] This comment contains a description of changes (with reason).
  • [ ] If you've fixed a bug or added code that should be tested, add tests!
  • [ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • [ ] If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • [ ] Make sure your code lints (nf-core lint).
  • [ ] Ensure the test suite passes (nf-test test tests/ --verbose --profile +docker).
  • [ ] Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • [ ] Usage Documentation in docs/usage.md is updated.
  • [ ] Output Documentation in docs/output.md is updated.
  • [ ] CHANGELOG.md is updated.
  • [ ] README.md is updated (including new tool citations and authors/contributors).

asp8200 avatar May 21 '24 16:05 asp8200

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 67e3f02

+| ✅ 200 tests passed       |+
#| ❔  12 tests were ignored |#
!| ❗   3 tests had warnings |!

:heavy_exclamation_mark: Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

:grey_question: Tests ignored:

  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: conf/modules.config
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_dark.png
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
  • actions_ci - actions_ci
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/sarek/sarek/.github/workflows/awstest.yml
  • template_strings - template_strings
  • modules_config - modules_config

:white_check_mark: Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-06-18 13:20:11

github-actions[bot] avatar May 21 '24 16:05 github-actions[bot]

Something wrong here :-/

The test

nextflow run main.nf -profile test,alignment_from_fastq_and_spring,docker --outdir results -resume --save_mapped --save_output_as_bam

results in bam and bai in different subfolders:

results/preprocessing/mapped/test/test-test_L1.sorted.bam.bai
results/preprocessing/mapped/test-test_L1/test-test_L1.sorted.bam

etc.

The bug must have something to do with meta.id being set to ${meta.sample}-${meta.lane}.

EDIT: It also seems to be a problem on the dev-branch and on the master-branch 😬 🤨 When I run some very similar test of the dev-branch or master-branch:

nextflow run main.nf -profile test,alignment_to_fastq,docker --outdir results -resume --save_mapped --save_output_as_bam

I get:

results/preprocessing/mapped/test/test-1.sorted.bam.bai
results/preprocessing/mapped/test-1/test-1.sorted.bam

asp8200 avatar May 22 '24 17:05 asp8200

Something wrong here :-/

The test

nextflow run main.nf -profile test,alignment_from_fastq_and_spring,docker --outdir results -resume --save_mapped --save_output_as_bam

results in bam and bai in different subfolders:

results/preprocessing/mapped/test/test-test_L1.sorted.bam.bai
results/preprocessing/mapped/test-test_L1/test-test_L1.sorted.bam

etc.

The bug must have something to do with meta.id being set to ${meta.sample}-${meta.lane}.

EDIT: It also seems to be a problem on the dev-branch and on the master-branch 😬 🤨 When I run some very similar test of the dev-branch or master-branch:

nextflow run main.nf -profile test,alignment_to_fastq,docker --outdir results -resume --save_mapped --save_output_as_bam

I get:

results/preprocessing/mapped/test/test-1.sorted.bam.bai
results/preprocessing/mapped/test-1/test-1.sorted.bam

I expect that problem to be solved by https://github.com/nf-core/sarek/pull/1541

asp8200 avatar May 24 '24 13:05 asp8200

I think publishing should be disabled by default. It bloats the output directory massively and likely folks don't need the converted fastq files in most cases

FriederikeHanssen avatar May 30 '24 08:05 FriederikeHanssen

I think publishing should be disabled by default. It bloats the output directory massively and likely folks don't need the converted fastq files in most cases

Ok. I'll try to disable that.

asp8200 avatar May 30 '24 08:05 asp8200

@FriederikeHanssen what do you mean by overview map?

maxulysse avatar Jun 18 '24 15:06 maxulysse

subway map will be done on a separate issue as I'll do it rather than @asp8200

maxulysse avatar Jun 18 '24 15:06 maxulysse