sarek
sarek copied to clipboard
WIP: Adding support for fastq.gz.spring-files as input
In this PR, I'm trying to add support for fastq.gz.spring-files as input. (This PR replaces the former https://github.com/nf-core/sarek/pull/1522 )
As agreed with @maxulysse, I extended the sample-sheet schema with spring_1 and spring_2.
The spring-files can be one files containing both R1 and R2 (goes in spring_1 while spring_2 is left undefined or empty) or two files - one with R1 (goes in spring_1). and one with R2 (goes in spring_2).
Disclaimer: Excuse the poorly named variables and modules "instances". Please suggest better names ;-)
TO-DO:
- [ ] - Streamline code
- [ ] - Add docs
- [x] - Add test-data
- [ ] - Add test
- [ ] - Update changelog
PR checklist
- [ ] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
- [ ] If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
- [ ] Make sure your code lints (
nf-core lint). - [ ] Ensure the test suite passes (
nf-test test tests/ --verbose --profile +docker). - [ ] Check for unexpected warnings in debug mode (
nextflow run . -profile debug,test,docker --outdir <OUTDIR>). - [ ] Usage Documentation in
docs/usage.mdis updated. - [ ] Output Documentation in
docs/output.mdis updated. - [ ]
CHANGELOG.mdis updated. - [ ]
README.mdis updated (including new tool citations and authors/contributors).
nf-core lint overall result: Passed :white_check_mark: :warning:
Posted for pipeline commit 67e3f02
+| ✅ 200 tests passed |+
#| ❔ 12 tests were ignored |#
!| ❗ 3 tests had warnings |!
:heavy_exclamation_mark: Test warnings:
- pipeline_todos - TODO string in
main.nf: Optionally add in-text citation tools to this list. - pipeline_todos - TODO string in
main.nf: Optionally add bibliographic entries to this list. - pipeline_todos - TODO string in
main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
:grey_question: Tests ignored:
- files_exist - File is ignored:
.github/workflows/awsfulltest.yml - files_exist - File is ignored:
.github/workflows/awstest.yml - files_exist - File is ignored:
conf/modules.config - files_unchanged - File ignored due to lint config:
.github/PULL_REQUEST_TEMPLATE.md - files_unchanged - File ignored due to lint config:
assets/nf-core-sarek_logo_light.png - files_unchanged - File ignored due to lint config:
docs/images/nf-core-sarek_logo_light.png - files_unchanged - File ignored due to lint config:
docs/images/nf-core-sarek_logo_dark.png - files_unchanged - File ignored due to lint config:
.gitignoreor.prettierignore - actions_ci - actions_ci
- actions_awstest - 'awstest.yml' workflow not found:
/home/runner/work/sarek/sarek/.github/workflows/awstest.yml - template_strings - template_strings
- modules_config - modules_config
:white_check_mark: Tests passed:
- files_exist - File found:
.gitattributes - files_exist - File found:
.gitignore - files_exist - File found:
.nf-core.yml - files_exist - File found:
.editorconfig - files_exist - File found:
.prettierignore - files_exist - File found:
.prettierrc.yml - files_exist - File found:
CHANGELOG.md - files_exist - File found:
CITATIONS.md - files_exist - File found:
CODE_OF_CONDUCT.md - files_exist - File found:
LICENSEorLICENSE.mdorLICENCEorLICENCE.md - files_exist - File found:
nextflow_schema.json - files_exist - File found:
nextflow.config - files_exist - File found:
README.md - files_exist - File found:
.github/.dockstore.yml - files_exist - File found:
.github/CONTRIBUTING.md - files_exist - File found:
.github/ISSUE_TEMPLATE/bug_report.yml - files_exist - File found:
.github/ISSUE_TEMPLATE/config.yml - files_exist - File found:
.github/ISSUE_TEMPLATE/feature_request.yml - files_exist - File found:
.github/PULL_REQUEST_TEMPLATE.md - files_exist - File found:
.github/workflows/branch.yml - files_exist - File found:
.github/workflows/ci.yml - files_exist - File found:
.github/workflows/linting_comment.yml - files_exist - File found:
.github/workflows/linting.yml - files_exist - File found:
assets/email_template.html - files_exist - File found:
assets/email_template.txt - files_exist - File found:
assets/sendmail_template.txt - files_exist - File found:
assets/nf-core-sarek_logo_light.png - files_exist - File found:
conf/test.config - files_exist - File found:
conf/test_full.config - files_exist - File found:
docs/images/nf-core-sarek_logo_light.png - files_exist - File found:
docs/images/nf-core-sarek_logo_dark.png - files_exist - File found:
docs/output.md - files_exist - File found:
docs/README.md - files_exist - File found:
docs/README.md - files_exist - File found:
docs/usage.md - files_exist - File found:
main.nf - files_exist - File found:
assets/multiqc_config.yml - files_exist - File found:
conf/base.config - files_exist - File found:
conf/igenomes.config - files_exist - File found:
modules.json - files_exist - File not found check:
.github/ISSUE_TEMPLATE/bug_report.md - files_exist - File not found check:
.github/ISSUE_TEMPLATE/feature_request.md - files_exist - File not found check:
.github/workflows/push_dockerhub.yml - files_exist - File not found check:
.markdownlint.yml - files_exist - File not found check:
.nf-core.yaml - files_exist - File not found check:
.yamllint.yml - files_exist - File not found check:
bin/markdown_to_html.r - files_exist - File not found check:
conf/aws.config - files_exist - File not found check:
docs/images/nf-core-sarek_logo.png - files_exist - File not found check:
lib/Checks.groovy - files_exist - File not found check:
lib/Completion.groovy - files_exist - File not found check:
lib/NfcoreTemplate.groovy - files_exist - File not found check:
lib/Utils.groovy - files_exist - File not found check:
lib/Workflow.groovy - files_exist - File not found check:
lib/WorkflowMain.groovy - files_exist - File not found check:
lib/WorkflowSarek.groovy - files_exist - File not found check:
parameters.settings.json - files_exist - File not found check:
pipeline_template.yml - files_exist - File not found check:
Singularity - files_exist - File not found check:
lib/nfcore_external_java_deps.jar - files_exist - File not found check:
.travis.yml - nextflow_config - Config variable found:
manifest.name - nextflow_config - Config variable found:
manifest.nextflowVersion - nextflow_config - Config variable found:
manifest.description - nextflow_config - Config variable found:
manifest.version - nextflow_config - Config variable found:
manifest.homePage - nextflow_config - Config variable found:
timeline.enabled - nextflow_config - Config variable found:
trace.enabled - nextflow_config - Config variable found:
report.enabled - nextflow_config - Config variable found:
dag.enabled - nextflow_config - Config variable found:
process.cpus - nextflow_config - Config variable found:
process.memory - nextflow_config - Config variable found:
process.time - nextflow_config - Config variable found:
params.outdir - nextflow_config - Config variable found:
params.input - nextflow_config - Config variable found:
params.validationShowHiddenParams - nextflow_config - Config variable found:
params.validationSchemaIgnoreParams - nextflow_config - Config variable found:
manifest.mainScript - nextflow_config - Config variable found:
timeline.file - nextflow_config - Config variable found:
trace.file - nextflow_config - Config variable found:
report.file - nextflow_config - Config variable found:
dag.file - nextflow_config - Config variable (correctly) not found:
params.nf_required_version - nextflow_config - Config variable (correctly) not found:
params.container - nextflow_config - Config variable (correctly) not found:
params.singleEnd - nextflow_config - Config variable (correctly) not found:
params.igenomesIgnore - nextflow_config - Config variable (correctly) not found:
params.name - nextflow_config - Config variable (correctly) not found:
params.enable_conda - nextflow_config - Config
timeline.enabledhad correct value:true - nextflow_config - Config
report.enabledhad correct value:true - nextflow_config - Config
trace.enabledhad correct value:true - nextflow_config - Config
dag.enabledhad correct value:true - nextflow_config - Config
manifest.namebegan withnf-core/ - nextflow_config - Config variable
manifest.homePagebegan with https://github.com/nf-core/ - nextflow_config - Config
dag.fileended with.html - nextflow_config - Config variable
manifest.nextflowVersionstarted with >= or !>= - nextflow_config - Config
manifest.versionends indev:3.5.0dev - nextflow_config - Config
params.custom_config_versionis set tomaster - nextflow_config - Config
params.custom_config_baseis set tohttps://raw.githubusercontent.com/nf-core/configs/master - nextflow_config - Lines for loading custom profiles found
- nextflow_config - nextflow.config contains configuration profile
test - nextflow_config - Config default value correct: params.step= mapping
- nextflow_config - Config default value correct: params.split_fastq= 50000000
- nextflow_config - Config default value correct: params.nucleotides_per_second= 200000
- nextflow_config - Config default value correct: params.clip_r1= 0
- nextflow_config - Config default value correct: params.clip_r2= 0
- nextflow_config - Config default value correct: params.three_prime_clip_r1= 0
- nextflow_config - Config default value correct: params.three_prime_clip_r2= 0
- nextflow_config - Config default value correct: params.trim_nextseq= 0
- nextflow_config - Config default value correct: params.group_by_umi_strategy= Adjacency
- nextflow_config - Config default value correct: params.aligner= bwa-mem
- nextflow_config - Config default value correct: params.ascat_min_base_qual= 20
- nextflow_config - Config default value correct: params.ascat_min_counts= 10
- nextflow_config - Config default value correct: params.ascat_min_map_qual= 35
- nextflow_config - Config default value correct: params.cf_coeff= 0.05
- nextflow_config - Config default value correct: params.cf_contamination= 0
- nextflow_config - Config default value correct: params.cf_minqual= 0
- nextflow_config - Config default value correct: params.cf_mincov= 0
- nextflow_config - Config default value correct: params.cf_ploidy= 2
- nextflow_config - Config default value correct: params.sentieon_haplotyper_emit_mode= variant
- nextflow_config - Config default value correct: params.sentieon_dnascope_emit_mode= variant
- nextflow_config - Config default value correct: params.sentieon_dnascope_pcr_indel_model= CONSERVATIVE
- nextflow_config - Config default value correct: params.dbnsfp_fields= rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF
- nextflow_config - Config default value correct: params.vep_custom_args= --everything --filter_common --per_gene --total_length --offline --format vcf
- nextflow_config - Config default value correct: params.vep_version= 111.0-0
- nextflow_config - Config default value correct: params.vep_out_format= vcf
- nextflow_config - Config default value correct: params.genome= GATK.GRCh38
- nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
- nextflow_config - Config default value correct: params.vep_cache= s3://annotation-cache/vep_cache/
- nextflow_config - Config default value correct: params.snpeff_cache= s3://annotation-cache/snpeff_cache/
- nextflow_config - Config default value correct: params.custom_config_version= master
- nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
- nextflow_config - Config default value correct: params.test_data_base= https://raw.githubusercontent.com/nf-core/test-datasets/sarek3
- nextflow_config - Config default value correct: params.seq_platform= ILLUMINA
- nextflow_config - Config default value correct: params.max_cpus= 16
- nextflow_config - Config default value correct: params.max_memory= 128.GB
- nextflow_config - Config default value correct: params.max_time= 240.h
- nextflow_config - Config default value correct: params.publish_dir_mode= copy
- nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
- nextflow_config - Config default value correct: params.validate_params= true
- nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
- files_unchanged -
.gitattributesmatches the template - files_unchanged -
.prettierrc.ymlmatches the template - files_unchanged -
CODE_OF_CONDUCT.mdmatches the template - files_unchanged -
LICENSEmatches the template - files_unchanged -
.github/.dockstore.ymlmatches the template - files_unchanged -
.github/CONTRIBUTING.mdmatches the template - files_unchanged -
.github/ISSUE_TEMPLATE/bug_report.ymlmatches the template - files_unchanged -
.github/ISSUE_TEMPLATE/config.ymlmatches the template - files_unchanged -
.github/ISSUE_TEMPLATE/feature_request.ymlmatches the template - files_unchanged -
.github/workflows/branch.ymlmatches the template - files_unchanged -
.github/workflows/linting_comment.ymlmatches the template - files_unchanged -
.github/workflows/linting.ymlmatches the template - files_unchanged -
assets/email_template.htmlmatches the template - files_unchanged -
assets/email_template.txtmatches the template - files_unchanged -
assets/sendmail_template.txtmatches the template - files_unchanged -
docs/README.mdmatches the template - readme - README Nextflow minimum version badge matched config. Badge:
23.04.0, Config:23.04.0 - readme - README Zenodo placeholder was replaced with DOI.
- pipeline_name_conventions - Name adheres to nf-core convention
- schema_lint - Schema lint passed
- schema_lint - Schema title + description lint passed
- schema_lint - Input mimetype lint passed: 'text/csv'
- schema_params - Schema matched params returned from nextflow config
- system_exit - No
System.exitcalls found - actions_schema_validation - Workflow validation passed: branch.yml
- actions_schema_validation - Workflow validation passed: ci.yml
- actions_schema_validation - Workflow validation passed: fix-linting.yml
- actions_schema_validation - Workflow validation passed: linting.yml
- actions_schema_validation - Workflow validation passed: download_pipeline.yml
- actions_schema_validation - Workflow validation passed: release-announcements.yml
- actions_schema_validation - Workflow validation passed: clean-up.yml
- actions_schema_validation - Workflow validation passed: linting_comment.yml
- actions_schema_validation - Workflow validation passed: cloudtest.yml
- actions_schema_validation - Workflow validation passed: ncbench.yml
- merge_markers - No merge markers found in pipeline files
- modules_json - Only installed modules found in
modules.json - multiqc_config -
assets/multiqc_config.ymlfound and not ignored. - multiqc_config -
assets/multiqc_config.ymlcontainsreport_section_order - multiqc_config -
assets/multiqc_config.ymlcontainsexport_plots - multiqc_config -
assets/multiqc_config.ymlcontainsreport_comment - multiqc_config -
assets/multiqc_config.ymlfollows the ordering scheme of the minimally required plugins. - multiqc_config -
assets/multiqc_config.ymlcontains a matching 'report_comment'. - multiqc_config -
assets/multiqc_config.ymlcontains 'export_plots: true'. - modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
- base_config -
conf/base.configfound and not ignored. - base_config -
UNZIPfound inconf/base.configand Nextflow scripts. - base_config -
FASTQCfound inconf/base.configand Nextflow scripts. - base_config -
FASTPfound inconf/base.configand Nextflow scripts. - base_config -
BWAMEM1_MEMfound inconf/base.configand Nextflow scripts. - base_config -
CNVKIT_BATCHfound inconf/base.configand Nextflow scripts. - base_config -
GATK4_MARKDUPLICATESfound inconf/base.configand Nextflow scripts. - base_config -
GATK4_APPLYBQSRfound inconf/base.configand Nextflow scripts. - base_config -
MOSDEPTHfound inconf/base.configand Nextflow scripts. - base_config -
STRELKAfound inconf/base.configand Nextflow scripts. - base_config -
SAMTOOLS_CONVERTfound inconf/base.configand Nextflow scripts. - base_config -
GATK4_MERGEVCFSfound inconf/base.configand Nextflow scripts. - base_config -
MULTIQCfound inconf/base.configand Nextflow scripts. - nfcore_yml - Repository type in
.nf-core.ymlis valid:pipeline - nfcore_yml - nf-core version in
.nf-core.ymlis set to the latest version:2.14.1
Run details
- nf-core/tools version 2.14.1
- Run at
2024-06-18 13:20:11
Something wrong here :-/
The test
nextflow run main.nf -profile test,alignment_from_fastq_and_spring,docker --outdir results -resume --save_mapped --save_output_as_bam
results in bam and bai in different subfolders:
results/preprocessing/mapped/test/test-test_L1.sorted.bam.bai
results/preprocessing/mapped/test-test_L1/test-test_L1.sorted.bam
etc.
The bug must have something to do with meta.id being set to ${meta.sample}-${meta.lane}.
EDIT: It also seems to be a problem on the dev-branch and on the master-branch 😬 🤨 When I run some very similar test of the dev-branch or master-branch:
nextflow run main.nf -profile test,alignment_to_fastq,docker --outdir results -resume --save_mapped --save_output_as_bam
I get:
results/preprocessing/mapped/test/test-1.sorted.bam.bai
results/preprocessing/mapped/test-1/test-1.sorted.bam
Something wrong here :-/
The test
nextflow run main.nf -profile test,alignment_from_fastq_and_spring,docker --outdir results -resume --save_mapped --save_output_as_bamresults in bam and bai in different subfolders:
results/preprocessing/mapped/test/test-test_L1.sorted.bam.bai results/preprocessing/mapped/test-test_L1/test-test_L1.sorted.bametc.
The bug must have something to do with
meta.idbeing set to${meta.sample}-${meta.lane}.EDIT: It also seems to be a problem on the
dev-branch and on themaster-branch 😬 🤨 When I run some very similar test of thedev-branch ormaster-branch:nextflow run main.nf -profile test,alignment_to_fastq,docker --outdir results -resume --save_mapped --save_output_as_bamI get:
results/preprocessing/mapped/test/test-1.sorted.bam.bai results/preprocessing/mapped/test-1/test-1.sorted.bam
I expect that problem to be solved by https://github.com/nf-core/sarek/pull/1541
I think publishing should be disabled by default. It bloats the output directory massively and likely folks don't need the converted fastq files in most cases
I think publishing should be disabled by default. It bloats the output directory massively and likely folks don't need the converted fastq files in most cases
Ok. I'll try to disable that.
@FriederikeHanssen what do you mean by overview map?
subway map will be done on a separate issue as I'll do it rather than @asp8200