scrnaseq
scrnaseq copied to clipboard
Adding cellrangermulti subworkflow
Close #247 Close #313
PR checklist
- [ ] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
- [ ] If necessary, also make a PR on the nf-core/scrnaseq branch on the nf-core/test-datasets repository.
- [ ] Make sure your code lints (
nf-core lint). - [ ] Ensure the test suite passes (
nextflow run . -profile test,docker --outdir <OUTDIR>). - [ ] Usage Documentation in
docs/usage.mdis updated. - [ ] Output Documentation in
docs/output.mdis updated. - [ ]
CHANGELOG.mdis updated. - [ ]
README.mdis updated (including new tool citations and authors/contributors).
Context
Hi guys,
Although not finished yet because it would still required updating the parameters schema, defaults and documentation, I am already opening the PR so we can all take a look at it and discuss any modifications required before merging and also, give it a round of tests and define how we want some parameters to be.
I used the templates provided by @klkeys
Usage context
samplesheet To use it, samplesheet requires an additional parameter so that we can properly mix the different feature types given per sample.
sample,fastq_1,fastq_2,feature_type,protocol,expected_cells
PBMC_10K,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/5gex/5gex/subsampled_sc5p_v2_hs_PBMC_10k_5gex_S1_L001_R1_001.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/5gex/5gex/subsampled_sc5p_v2_hs_PBMC_10k_5gex_S1_L001_R2_001.fastq.gz,gex,SC5P-PE,1000
PBMC_10K,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/bcell/subsampled_sc5p_v2_hs_PBMC_10k_b_S1_L001_R1_001.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/bcell/subsampled_sc5p_v2_hs_PBMC_10k_b_S1_L001_R2_001.fastq.gz,vdj,SC5P-PE,1000
PBMC_10K,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/5gex/5fb/subsampled_sc5p_v2_hs_PBMC_10k_5fb_S1_L001_R1_001.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/5gex/5fb/subsampled_sc5p_v2_hs_PBMC_10k_5fb_S1_L001_R2_001.fastq.gz,ab,SC5P-PE,1000
PBMC_10K_CMO,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/fastqs/gex_1/subsampled_SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K_1_gex_S2_L001_R1_001.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/fastqs/gex_1/subsampled_SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K_1_gex_S2_L001_R2_001.fastq.gz,gex,SC3Pv3,1000
PBMC_10K_CMO,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/fastqs/cmo/subsampled_SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K_1_multiplexing_capture_S1_L001_R1_001.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/fastqs/cmo/subsampled_SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K_1_multiplexing_capture_S1_L001_R2_001.fastq.gz,cmo,SC3Pv3,1000
PBMC_10K_CMV,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/gex_1/subsampled_5k_human_antiCMV_T_TBNK_connect_GEX_1_S1_L001_R1_001.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/gex_1/subsampled_5k_human_antiCMV_T_TBNK_connect_GEX_1_S1_L001_R2_001.fastq.gz,gex,SC5P-R2,1000
PBMC_10K_CMV,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/ab/subsampled_5k_human_antiCMV_T_TBNK_connect_AB_S2_L004_R1_001.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/ab/subsampled_5k_human_antiCMV_T_TBNK_connect_AB_S2_L004_R2_001.fastq.gz,ab,SC5P-R2,1000
PBMC_10K_CMV,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/vdj/subsampled_5k_human_antiCMV_T_TBNK_connect_VDJ_S1_L001_R1_001.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/vdj/subsampled_5k_human_antiCMV_T_TBNK_connect_VDJ_S1_L001_R2_001.fastq.gz,vdj,SC5P-R2,1000
Supporting files
Right now, all the supporting files have been added as parameters, for example, cmo_barcode_csv, beam_antigen_csv, etc. .... which means they will work in a dataset manner, being the same for everything given in the samplesheet, instead of samplesheet base if they were added as columns in the samplesheet.
My main question here is, what should it be the desired approach?
Other stuff Of course there might still have things to clear or finish that I might have overlooked since there is quite a lot on it, so, I request your help on spotting it.
testing case required the full genomes from ensembl, otherwise, the analysis using VDJ was failling.
Python linting (black) is failing
To keep the code consistent with lots of contributors, we run automated code consistency checks. To fix this CI test, please run:
- Install
black:pip install black - Fix formatting errors in your pipeline:
black .
Once you push these changes the test should pass, and you can hide this comment :+1:
We highly recommend setting up Black in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!
Thanks again for your contribution!
nf-core lint overall result: Passed :white_check_mark: :warning:
Posted for pipeline commit 4d9f17e
+| ✅ 203 tests passed |+
#| ❔ 4 tests were ignored |#
!| ❗ 4 tests had warnings |!
:heavy_exclamation_mark: Test warnings:
- nextflow_config - Config
manifest.versionshould end indev:2.6.0 - pipeline_todos - TODO string in
main.nf: Optionally add in-text citation tools to this list. - pipeline_todos - TODO string in
main.nf: Optionally add bibliographic entries to this list. - pipeline_todos - TODO string in
main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
:grey_question: Tests ignored:
- files_exist - File is ignored:
lib/Utils.groovy - files_unchanged - File ignored due to lint config:
.github/ISSUE_TEMPLATE/bug_report.yml - template_strings - template_strings
- schema_params - schema_params
:white_check_mark: Tests passed:
- files_exist - File found:
.gitattributes - files_exist - File found:
.gitignore - files_exist - File found:
.nf-core.yml - files_exist - File found:
.editorconfig - files_exist - File found:
.prettierignore - files_exist - File found:
.prettierrc.yml - files_exist - File found:
CHANGELOG.md - files_exist - File found:
CITATIONS.md - files_exist - File found:
CODE_OF_CONDUCT.md - files_exist - File found:
LICENSEorLICENSE.mdorLICENCEorLICENCE.md - files_exist - File found:
nextflow_schema.json - files_exist - File found:
nextflow.config - files_exist - File found:
README.md - files_exist - File found:
.github/.dockstore.yml - files_exist - File found:
.github/CONTRIBUTING.md - files_exist - File found:
.github/ISSUE_TEMPLATE/bug_report.yml - files_exist - File found:
.github/ISSUE_TEMPLATE/config.yml - files_exist - File found:
.github/ISSUE_TEMPLATE/feature_request.yml - files_exist - File found:
.github/PULL_REQUEST_TEMPLATE.md - files_exist - File found:
.github/workflows/branch.yml - files_exist - File found:
.github/workflows/ci.yml - files_exist - File found:
.github/workflows/linting_comment.yml - files_exist - File found:
.github/workflows/linting.yml - files_exist - File found:
assets/email_template.html - files_exist - File found:
assets/email_template.txt - files_exist - File found:
assets/sendmail_template.txt - files_exist - File found:
assets/nf-core-scrnaseq_logo_light.png - files_exist - File found:
conf/modules.config - files_exist - File found:
conf/test.config - files_exist - File found:
conf/test_full.config - files_exist - File found:
docs/images/nf-core-scrnaseq_logo_light.png - files_exist - File found:
docs/images/nf-core-scrnaseq_logo_dark.png - files_exist - File found:
docs/output.md - files_exist - File found:
docs/README.md - files_exist - File found:
docs/README.md - files_exist - File found:
docs/usage.md - files_exist - File found:
main.nf - files_exist - File found:
assets/multiqc_config.yml - files_exist - File found:
conf/base.config - files_exist - File found:
conf/igenomes.config - files_exist - File found:
.github/workflows/awstest.yml - files_exist - File found:
.github/workflows/awsfulltest.yml - files_exist - File found:
modules.json - files_exist - File not found check:
.github/ISSUE_TEMPLATE/bug_report.md - files_exist - File not found check:
.github/ISSUE_TEMPLATE/feature_request.md - files_exist - File not found check:
.github/workflows/push_dockerhub.yml - files_exist - File not found check:
.markdownlint.yml - files_exist - File not found check:
.nf-core.yaml - files_exist - File not found check:
.yamllint.yml - files_exist - File not found check:
bin/markdown_to_html.r - files_exist - File not found check:
conf/aws.config - files_exist - File not found check:
docs/images/nf-core-scrnaseq_logo.png - files_exist - File not found check:
lib/Checks.groovy - files_exist - File not found check:
lib/Completion.groovy - files_exist - File not found check:
lib/NfcoreTemplate.groovy - files_exist - File not found check:
lib/Workflow.groovy - files_exist - File not found check:
lib/WorkflowMain.groovy - files_exist - File not found check:
lib/WorkflowScrnaseq.groovy - files_exist - File not found check:
parameters.settings.json - files_exist - File not found check:
pipeline_template.yml - files_exist - File not found check:
Singularity - files_exist - File not found check:
lib/nfcore_external_java_deps.jar - files_exist - File not found check:
.travis.yml - nextflow_config - Config variable found:
manifest.name - nextflow_config - Config variable found:
manifest.nextflowVersion - nextflow_config - Config variable found:
manifest.description - nextflow_config - Config variable found:
manifest.version - nextflow_config - Config variable found:
manifest.homePage - nextflow_config - Config variable found:
timeline.enabled - nextflow_config - Config variable found:
trace.enabled - nextflow_config - Config variable found:
report.enabled - nextflow_config - Config variable found:
dag.enabled - nextflow_config - Config variable found:
process.cpus - nextflow_config - Config variable found:
process.memory - nextflow_config - Config variable found:
process.time - nextflow_config - Config variable found:
params.outdir - nextflow_config - Config variable found:
params.input - nextflow_config - Config variable found:
params.validationShowHiddenParams - nextflow_config - Config variable found:
params.validationSchemaIgnoreParams - nextflow_config - Config variable found:
manifest.mainScript - nextflow_config - Config variable found:
timeline.file - nextflow_config - Config variable found:
trace.file - nextflow_config - Config variable found:
report.file - nextflow_config - Config variable found:
dag.file - nextflow_config - Config variable (correctly) not found:
params.nf_required_version - nextflow_config - Config variable (correctly) not found:
params.container - nextflow_config - Config variable (correctly) not found:
params.singleEnd - nextflow_config - Config variable (correctly) not found:
params.igenomesIgnore - nextflow_config - Config variable (correctly) not found:
params.name - nextflow_config - Config variable (correctly) not found:
params.enable_conda - nextflow_config - Config
timeline.enabledhad correct value:true - nextflow_config - Config
report.enabledhad correct value:true - nextflow_config - Config
trace.enabledhad correct value:true - nextflow_config - Config
dag.enabledhad correct value:true - nextflow_config - Config
manifest.namebegan withnf-core/ - nextflow_config - Config variable
manifest.homePagebegan with https://github.com/nf-core/ - nextflow_config - Config
dag.fileended with.html - nextflow_config - Config variable
manifest.nextflowVersionstarted with >= or !>= - nextflow_config - Config
params.custom_config_versionis set tomaster - nextflow_config - Config
params.custom_config_baseis set tohttps://raw.githubusercontent.com/nf-core/configs/master - nextflow_config - Lines for loading custom profiles found
- nextflow_config - nextflow.config contains configuration profile
test - nextflow_config - Config default value correct: params.aligner= alevin
- nextflow_config - Config default value correct: params.protocol= auto
- nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
- nextflow_config - Config default value correct: params.simpleaf_rlen= 91
- nextflow_config - Config default value correct: params.star_feature= Gene
- nextflow_config - Config default value correct: params.kb_workflow= standard
- nextflow_config - Config default value correct: params.custom_config_version= master
- nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
- nextflow_config - Config default value correct: params.max_cpus= 16
- nextflow_config - Config default value correct: params.max_memory= 128.GB
- nextflow_config - Config default value correct: params.max_time= 240.h
- nextflow_config - Config default value correct: params.publish_dir_mode= copy
- nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
- nextflow_config - Config default value correct: params.validate_params= true
- nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
- files_unchanged -
.gitattributesmatches the template - files_unchanged -
.prettierrc.ymlmatches the template - files_unchanged -
CODE_OF_CONDUCT.mdmatches the template - files_unchanged -
LICENSEmatches the template - files_unchanged -
.github/.dockstore.ymlmatches the template - files_unchanged -
.github/CONTRIBUTING.mdmatches the template - files_unchanged -
.github/ISSUE_TEMPLATE/config.ymlmatches the template - files_unchanged -
.github/ISSUE_TEMPLATE/feature_request.ymlmatches the template - files_unchanged -
.github/PULL_REQUEST_TEMPLATE.mdmatches the template - files_unchanged -
.github/workflows/branch.ymlmatches the template - files_unchanged -
.github/workflows/linting_comment.ymlmatches the template - files_unchanged -
.github/workflows/linting.ymlmatches the template - files_unchanged -
assets/email_template.htmlmatches the template - files_unchanged -
assets/email_template.txtmatches the template - files_unchanged -
assets/sendmail_template.txtmatches the template - files_unchanged -
assets/nf-core-scrnaseq_logo_light.pngmatches the template - files_unchanged -
docs/images/nf-core-scrnaseq_logo_light.pngmatches the template - files_unchanged -
docs/images/nf-core-scrnaseq_logo_dark.pngmatches the template - files_unchanged -
docs/README.mdmatches the template - files_unchanged -
.gitignorematches the template - files_unchanged -
.prettierignorematches the template - actions_ci - '.github/workflows/ci.yml' is triggered on expected events
- actions_ci - '.github/workflows/ci.yml' checks minimum NF version
- actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
- actions_awsfulltest -
.github/workflows/awsfulltest.ymlis triggered correctly - actions_awsfulltest -
.github/workflows/awsfulltest.ymldoes not use-profile test - readme - README Nextflow minimum version badge matched config. Badge:
23.04.0, Config:23.04.0 - readme - README Zenodo placeholder was replaced with DOI.
- pipeline_name_conventions - Name adheres to nf-core convention
- schema_lint - Schema lint passed
- schema_lint - Schema title + description lint passed
- schema_lint - Input mimetype lint passed: 'text/csv'
- system_exit - No
System.exitcalls found - actions_schema_validation - Workflow validation passed: branch.yml
- actions_schema_validation - Workflow validation passed: ci.yml
- actions_schema_validation - Workflow validation passed: awsfulltest.yml
- actions_schema_validation - Workflow validation passed: fix-linting.yml
- actions_schema_validation - Workflow validation passed: linting.yml
- actions_schema_validation - Workflow validation passed: download_pipeline.yml
- actions_schema_validation - Workflow validation passed: release-announcements.yml
- actions_schema_validation - Workflow validation passed: clean-up.yml
- actions_schema_validation - Workflow validation passed: awstest.yml
- actions_schema_validation - Workflow validation passed: linting_comment.yml
- merge_markers - No merge markers found in pipeline files
- modules_json - Only installed modules found in
modules.json - multiqc_config -
assets/multiqc_config.ymlfound and not ignored. - multiqc_config -
assets/multiqc_config.ymlcontainsreport_section_order - multiqc_config -
assets/multiqc_config.ymlcontainsexport_plots - multiqc_config -
assets/multiqc_config.ymlcontainsreport_comment - multiqc_config -
assets/multiqc_config.ymlfollows the ordering scheme of the minimally required plugins. - multiqc_config -
assets/multiqc_config.ymlcontains a matching 'report_comment'. - multiqc_config -
assets/multiqc_config.ymlcontains 'export_plots: true'. - modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
- base_config -
conf/base.configfound and not ignored. - modules_config -
conf/modules.configfound and not ignored. - modules_config -
FASTQCfound inconf/modules.configand Nextflow scripts. - modules_config -
MULTIQCfound inconf/modules.configand Nextflow scripts. - modules_config -
EMPTYDROPS_CELL_CALLINGfound inconf/modules.configand Nextflow scripts. - modules_config -
MTX_TO_H5ADfound inconf/modules.configand Nextflow scripts. - modules_config -
GTF_GENE_FILTERfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGER_MKGTFfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGER_MKREFfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGER_COUNTfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGERARC_MKGTFfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGERARC_MKREFfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGERARC_COUNTfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGER_MKGTFfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGER_MKREFfound inconf/modules.configand Nextflow scripts. - modules_config -
UNIVERSCfound inconf/modules.configand Nextflow scripts. - modules_config -
GFFREAD_TXP2GENEfound inconf/modules.configand Nextflow scripts. - modules_config -
SIMPLEAF_INDEXfound inconf/modules.configand Nextflow scripts. - modules_config -
SIMPLEAF_QUANTfound inconf/modules.configand Nextflow scripts. - modules_config -
ALEVINQCfound inconf/modules.configand Nextflow scripts. - modules_config -
STAR_ALIGNfound inconf/modules.configand Nextflow scripts. - modules_config -
STAR_GENOMEGENERATEfound inconf/modules.configand Nextflow scripts. - modules_config -
STAR_ALIGNfound inconf/modules.configand Nextflow scripts. - modules_config -
KALLISTOBUSTOOLS_REFfound inconf/modules.configand Nextflow scripts. - modules_config -
KALLISTOBUSTOOLS_COUNTfound inconf/modules.configand Nextflow scripts. - modules_config -
FASTQCfound inconf/modules.configand Nextflow scripts. - modules_config -
NFCORE_SCRNASEQfound inconf/modules.configand Nextflow scripts. - modules_config -
GUNZIPfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGER_MKGTFfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGER_MKREFfound inconf/modules.configand Nextflow scripts. - modules_config -
CELLRANGER_MKVDJREFfound inconf/modules.configand Nextflow scripts. - nfcore_yml - Repository type in
.nf-core.ymlis valid:pipeline - nfcore_yml - nf-core version in
.nf-core.ymlis set to the latest version:2.14.1
Run details
- nf-core/tools version 2.14.1
- Run at
2024-05-22 17:08:54
Hi guys, @grst , @klkeys and @apeltzer , Do you guys have any other comments on this one? Specially about the following PR in nfcore/modules https://github.com/nf-core/modules/pull/4802 ?
As far as I see, this is just the one main blocker, so would be:
- Solving this nfcore/modules PR, which relates to the comments done about the fastqs renaming module (here).
- Top priority
- Once solving this one, I would just solve the other comments made by Greg so that is finished. Unless you guys have other comments on it.
@klkeys, @apeltzer for me an open discussion is whether the renaming should be a separate module or not - would appreciate additional opinions: https://github.com/nf-core/modules/pull/4802#pullrequestreview-1843492577
In the past I opted for including the renaming into the cellranger module for efficiency (avoiding heavy file transfers on cloud setups), but from a maintenance perspective it would be attractive to separate it, since it can be applied to all 10x workflows (i.e. spaceranger, cellranger, cellranger arc, cellranger multi).
Hi @grst ; @klkeys and @apeltzer .
background changes
I have solved almost all the comments that were opened and updated the code. The workflow has been tested and it seems to be working.
There is a new testing profile that you can use for checking it out during review:
nextflow run ../main.nf -profile docker,test_cellrangermulti --outdir ./result
But, would also be properly recommended to trigger it manually for another dataset.
open questions
Although most was resolved, there are still some things open for resolution:
- The
cellranger_multimodule (and sub-workflow) contain a set of additional CSVs that can be given as input. Currently they are treated as parameters, meaning is one for all the samples in the input samplesheet. A question here is, will this be the case, or will we have one per sample?- See here: https://github.com/nf-core/scrnaseq/pull/276#discussion_r1413510264
- Are we going to add the
testing cmo barcodestotest_datasetsnow?- See here: https://github.com/nf-core/scrnaseq/pull/276#discussion_r1413499008
- What will the colum name we will keep?
- See here: https://github.com/nf-core/scrnaseq/pull/276#discussion_r1413497114
See here for a full upgrade of all cellranger modules. https://github.com/nf-core/modules/pull/5016
Hi @grst ,
FYI.
I will include in this PR, the fix for cellranger/kallisto subworkflows to properly parse the raw/filtered matrices, splitting the channels, without the need for having multiple output channel directives in the modules.
This happen during this here https://github.com/nf-core/modules/pull/5238 which showed that this was not wanted. So we reverted the modules.
But, after some day on it, I could find a way of splitting so we could pass raw/filtered information to the downstream modules to integrate properly with the new emptydrops work.
I will include it in this PR, the fix because it is are a few lines at the end of the sub-workflow, which is basically the same being added in this multi-subworkflow. And was during this PR that I found the fix.
Hi @grst , @klkeys and @apeltzer ,
I have finalized the update of the modules and making sure the workflow runs. The pipeline runs for all the main aligners. I tested:
- alevin
- star
- kallisto standard
- kallisto standard with filtering
- kallisto nac
- kallisti lamanno
- cellranger
- cellrangermulti
Since I cannot share the results because they are big for attaching in the comment, I am adding a file containing the tree structure of all the testing runs I generated.
The workflow is updated and running for all of them. Cellranger multi is running as described in the first comment of the PR. The only open points thus far are the ones highlighted here: https://github.com/nf-core/scrnaseq/pull/276#issuecomment-1972760047
So, since it is working, and a few points are open, I consider this PR ready for a proper review, thus I am promoting it from draft to ready.
Wait your review and feedback.
nf-test
Although the github action fail, it runs through if I execute in gitpod:
Hi @grst , I do not quite get it this one: https://github.com/nf-core/scrnaseq/pull/276#discussion_r1551764285
I checked the current input samplesheet, and these values are not available there. But, and I saw that you tagged the three input csvs that the module receives. Are you referring to one specifically, or to all?
I am not sure I quite understood. Also, I am not the expert on these csvs, so do not know what they should be. Currently the tests have only the cmo barcodes csv. So, maybe @klkeys would help more with that.
Maybe we can have a chat the three of us to discuss this one and how to design it better now that we have something that runs?
Currently, emptydrops will not be performed for cellranger/multi as I am not sure if it is relevant. https://github.com/nf-core/scrnaseq/blob/247-support-for-10x-ffpe-scrna/workflows/scrnaseq.nf#L278
I think it's still relevant, at least when it runs in FFPE or CMO mode (probably not so much in CRISPR/BEAM/ANTIBODY mode, but I don't have experience with these data types). But this can also be added in a follow-up PR.
Hi @grst and @klkeys ,
Summary of latest code.
- [x] An
FFPEdataset was added in the test dataset to allow proper parsing - [x] During
FFPEanalysis, a few bugs and unproper handling of optional files in the cellranger/multi module were observed, and, a new PR was added here for that: https://github.com/nf-core/modules/pull/5542 - There are a few TODOs in the work, that we may resolve in this PR, or in another, they are:
- [ ] https://github.com/nf-core/scrnaseq/blob/247-support-for-10x-ffpe-scrna/subworkflows/local/align_cellrangermulti.nf#L24-L26
- [x] https://github.com/nf-core/scrnaseq/blob/247-support-for-10x-ffpe-scrna/subworkflows/local/align_cellrangermulti.nf#L228
- [x] https://github.com/nf-core/scrnaseq/blob/247-support-for-10x-ffpe-scrna/workflows/scrnaseq.nf#L290
- [x] https://github.com/nf-core/scrnaseq/blob/247-support-for-10x-ffpe-scrna/nextflow_schema.json#L328-L355 where I need help in providing description to files.
- [x] Finally, it also needs the nf-tests to be made ( I am currently working on them )
- [x] Once all above is solved, to add documentation.
Hi @grst,
Following up the previous message. nf-test has now been added and is working with github actions (discussion here).
One question that comes up from that, is: whether should we add the ensembl references used in the nf-core/test-datasets or not.
@nf-core-bot fix linting
One question that comes up from that, is: whether should we add the ensembl references used in the nf-core/test-datasets or not.
Ideally, yes, but I don't really care tbh ;)
HI @fmalmeida, @grst
I successfully completed the pipeline (5' Inmune profiling), which ran for approximately 7 hours. Notably, the CELLRANGER:MULTI process took the longest, spanning 5 hours and 45 minutes.
I had to restrict the run to half of my samples. Working with the ab data of 13.7GB and gex data of 59GB. When working with the total amount of data, the pipeline broke under the error "no disk space in scracht folder" despite having 200T available. Is this expected behavior?
Additionally, I observed that the specific process is allocated a time of 16 hours. In future runs, I aim to process 48 samples. Is it possible to adjust this setting for the extended sample size?
@nf-core-bot fix linting
HI @fmalmeida, @grst
I successfully completed the pipeline (5' Inmune profiling), which ran for approximately 7 hours. Notably, the CELLRANGER:MULTI process took the longest, spanning 5 hours and 45 minutes.
I had to restrict the run to half of my samples. Working with the ab data of 13.7GB and gex data of 59GB. When working with the total amount of data, the pipeline broke under the error "no disk space in scracht folder" despite having 200T available. Is this expected behavior?
Additionally, I observed that the specific process is allocated a time of 16 hours. In future runs, I aim to process 48 samples. Is it possible to adjust this setting for the extended sample size?
Hi @grst , You said you were able to run on one of your datasets. Do you know if this is normal? Did you also observe something like this, with so gigantic data generation?
I am not used to run this tool, so I do not know if this is expected or not.
Finally, @NNazeu , for you latter inquiry, you can modify anything from the configuration with a pattern like this one in a custom config file:
process {
withName: CELLRANGER_MULTI {
time = 24.h
}
}
Thanks for testing the PR.
I did observe the long runtime and maybe we should consider requesting more CPUs for cellranger multi. I didn't observe any excessive disk usage, at least it stayed well below a terabyte.
@NNazeu, are you sure you aren't hitting any disk quotas?
Ok. I think I'm done with the documentation.
I don't think lint will pass before the template update is merged. However, since this is such massive PR, I'd do that after this one gets merged to dev to avoid further disruptions.
Are we there then? I don't think there's anything pending for now.
Hi @grst,
I think so. The last things were the documentation and the things you had brought. There are a few TODOs but can be taken later. As the BEAM data.
This was the comment summarising it: https://github.com/nf-core/scrnaseq/pull/276#issuecomment-2084997587
We can have a last look on that, and open follow-ups, otherwise, it is.
I have added a follow-up ticket for the open points: https://github.com/nf-core/scrnaseq/issues/332