Refactoring some of the global steps.
Description of feature
In the current design of quantms workflow, we have some steps like FILE PREPARATION, SUMMARYPIPELINE that are at the root of the workflow, see example 👇: QUANTMS:FILE_PREPARATION, while the majority of the steps are QUANTMS:DIA:{}. This is confusing for the user don't know if tihs is part of the DIA workflow or something else.
[61/ebba76] Submitted process > BIGBIO_QUANTMS:QUANTMS:FILE_PREPARATION:THERMORAWFILEPARSER (RD139_Narrow_UPS1_0_1fmol_inj1)
[11/da3064] Submitted process > BIGBIO_QUANTMS:QUANTMS:FILE_PREPARATION:THERMORAWFILEPARSER (RD139_Narrow_UPS1_0_1fmol_inj2)
[92/c06fe7] Submitted process > BIGBIO_QUANTMS:QUANTMS:FILE_PREPARATION:THERMORAWFILEPARSER (RD139_Narrow_UPS1_0_25fmol_inj1)
[a8/cbce0a] Submitted process > BIGBIO_QUANTMS:QUANTMS:FILE_PREPARATION:THERMORAWFILEPARSER (RD139_Narrow_UPS1_0_25fmol_inj2)
[03/2951bf] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:GENERATE_CFG (PXD026600.sdrf)
[de/bbcb6d] Submitted process > BIGBIO_QUANTMS:QUANTMS:FILE_PREPARATION:MZML_STATISTICS (RD139_Narrow_UPS1_0_1fmol_inj1)
[bb/dd9713] Submitted process > BIGBIO_QUANTMS:QUANTMS:FILE_PREPARATION:MZML_STATISTICS (RD139_Narrow_UPS1_0_1fmol_inj2)
[18/c95e9a] Submitted process > BIGBIO_QUANTMS:QUANTMS:FILE_PREPARATION:MZML_STATISTICS (RD139_Narrow_UPS1_0_25fmol_inj1)
[f8/0419f0] Submitted process > BIGBIO_QUANTMS:QUANTMS:FILE_PREPARATION:MZML_STATISTICS (RD139_Narrow_UPS1_0_25fmol_inj2)
[a6/bd956f] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:INSILICO_LIBRARY_GENERATION (REF_EColi_K12_UPS1_combined.fasta)
[[24](https://github.com/bigbio/quantms/actions/runs/15183897490/job/42699738072?pr=551#step:7:25)/55e348] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:PRELIMINARY_ANALYSIS (RD139_Narrow_UPS1_0_25fmol_inj1)
[bb/84b883] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:PRELIMINARY_ANALYSIS (RD139_Narrow_UPS1_0_[25](https://github.com/bigbio/quantms/actions/runs/15183897490/job/42699738072?pr=551#step:7:26)fmol_inj2)
[05/1c9dee] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:PRELIMINARY_ANALYSIS (RD139_Narrow_UPS1_0_1fmol_inj1)
[31/8025f2] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:PRELIMINARY_ANALYSIS (RD139_Narrow_UPS1_0_1fmol_inj2)
[25/4a9fe1] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:ASSEMBLE_EMPIRICAL_LIBRARY (PXD0[26](https://github.com/bigbio/quantms/actions/runs/15183897490/job/42699738072?pr=551#step:7:27)600.sdrf)
[a7/8c0f3b] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:INDIVIDUAL_ANALYSIS (RD139_Narrow_UPS1_0_25fmol_inj2)
[3a/95b91a] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:INDIVIDUAL_ANALYSIS (RD139_Narrow_UPS1_0_1fmol_inj1)
[65/0149f6] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:INDIVIDUAL_ANALYSIS (RD139_Narrow_UPS1_0_25fmol_inj1)
[54/c62c57] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:INDIVIDUAL_ANALYSIS (RD139_Narrow_UPS1_0_1fmol_inj2)
[bf/61dd1a] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:FINAL_QUANTIFICATION (PXD026600.sdrf)
DIANNCONVERT is based on the output of DIA-NN 1.8.1, 2.0.* and 2.1.*, other versions of DIA-NN don't support mzTab conversion.
[a5/95c2c6] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:CONVERT_RESULTS (PXD026600.sdrf)
[05/693a45] Submitted process > BIGBIO_QUANTMS:QUANTMS:DIA:MSSTATS_LFQ (PXD026600.sdrf_openms_design_msstats_in.csv)
[92/f17800] Submitted process > BIGBIO_QUANTMS:QUANTMS:SUMMARY_PIPELINE (1)
Im wondering if it would be interesting to have some refactoring and have a different approach. we have different options:
- [ ] 1. Leave it like it is, generic step at the level of QUANTS:
QUANTMS:{} - [ ] 2. Repeat some of these code steps on each DIA, TMT and LFQ:
QUANTMS:DIA:{}orQUANTMS:LFQ:{} - [ ] 3. Have a generic prefix like:
QUANTMS:GENERIC:{} - [ ] 4. Create a global subworkflow for both kind of:
QUANTMS:FILE_PROCESSING:{STEPS}QUANTMS:SUMMARYPIPELIE:PMULTIQC
What do you think, guys?
I have no strong opinion here.
1- it's the only one that correctly represents the DAG
Actually @jpfeuffer the only step that is not doing this is:
Submitted process > BIGBIO_QUANTMS:QUANTMS:SUMMARY_PIPELINE
We should probably move it into some kind of BIGBIO_QUANTMS:QUANTMS:SUMMARY_PIPELINE:PMULTIQC
1-Consistent with steps. I have no strong opinion for pmultiqc. Because there is only one step for SUMMARY_PIPELINE. pmultiqc just been renamed to SUMMARY_PIPELINE.
Ah, if pmultiqc is the only step in the summary pipeline sub workflow you can also just use pmultiqc directly in the parent workflow if you want. But you don't have to, it's fine with me.
This issue has been solved in the following PR #551