Error: Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)
Description of the bug
Despite of having unique combination of Fraction_Group Fraction and Label, at PROTEIN_QUANTIFIER process, the pipeline ends with an error ((Fraction Group, Fraction, Label) combination can only appear once)
Attached are the SDRF.tsv, opens_design.tsv and screenshot of the run.
PXD009920.sdrf_openms_design.tsv
Command used and terminal output
-[bigbio/quantms] Pipeline completed with errors-
ERROR ~ Error executing process > 'BIGBIO_QUANTMS:QUANTMS:TMT:PROTEIN_QUANT:MSSTATS_CONVERTER (PXD009920.sdrf_openms_design.tsv)'
Caused by:
Process `BIGBIO_QUANTMS:QUANTMS:TMT:PROTEIN_QUANT:MSSTATS_CONVERTER (PXD009920.sdrf_openms_design.tsv)` terminated with an error exit status (8)
Command executed:
MSstatsConverter \
-in ID_mapper_merge_epi_filter_resconf.consensusXML \
-in_design PXD009920.sdrf_openms_design.tsv \
-method ISO \
-out PXD009920.sdrf_openms_design_msstats_in.csv \
-debug 0 \
2>&1 | tee MSstatsConverter.log
cat <<-END_VERSIONS > versions.yml
"BIGBIO_QUANTMS:QUANTMS:TMT:PROTEIN_QUANT:MSSTATS_CONVERTER":
MSstatsConverter: $(MSstatsConverter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
END_VERSIONS
Command exit status:
8
Command output:
Error: Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)
Command error:
Error: Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)
Work dir:
/Data/nayan/Shortcut_Data/pysradb_downloads/IPX0004838000/PXD009920/work/c7/f38450e1a602b8265ab8dbf6d4070f
Container:
ghcr.io/bigbio/openms-tools-thirdparty:2025.04.14
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details
Relevant files
No response
System information
No response
Hi!
The error is here:
| 1 | 1 | Margolis_Mouse_Neuronal_TMT_TP_F1.mzML | 8 | 8 |
|---|---|---|---|---|
| 1 | 1 | Margolis_Mouse_Neuronal_TMT_TP_F1.mzML | 8 | 9 |
@jpfeuffer Thank you very much! However the issue still persists. Is there any tool that could check SDRF file for these similar mistakes that could be installed? (Thank you again in advance.)
Hi @nayanvs , there is a duplicate label value for this file in your SDRF. It should be unique for data file combined with label. Please change TMT130N to TMT130C
It seem appears we are currently adding this unique validation logic. However, this version has not yet been released. https://github.com/bigbio/sdrf-pipelines/blob/7b7dd367ba7ffe105356e9ba0c1f29348d3169e1/sdrf_pipelines/sdrf/validators.py#L76
@jpfeuffer @daichengxin Thank you for your feedback!
Hi, I have resolved this but the issue still persist. See the updated files below: By the way are your using any tool to detect these errors?
you could use this script to check duplicate rows @nayanvs :
sdrf = pd.read_csv("PXD009920.sdrf.tsv", sep="\t")
dups = sdrf[sdrf.duplicated(subset=["comment[data file]", "comment[label]"], keep=False)]
print(dups)
The last files look fine to me. Did you make sure there are no leftovers of the old file cached somewhere? I am not sure how this behaves during a nextflow resume.
Please raise an issue or even better create a PR at the sdrf repository. It should be checked in the validator. As @daichengxin mentioned maybe this is checked already but the version of the validator we use in quantms is outdated.
@ypriverol do you know more?
@nayanvs did you manage to solve the SDRF issue?
Thank you for your input. I was able to resolve the issue by deduplicating the labels. I did encounter another error afterward, but it appears to be unrelated.
`[d6/e21f72] process > NFCORE_QUANTMS:QUANTMS:TMT:MSSTATSTMT (PXD009920.sdrf_openms_design_msstats_in.csv) [ 0%] 0 of 1 ✘ [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:DATABASESEARCHENGINES:SEARCHENGINECOMET - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:EXTRACTPSMFEATURES - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:PERCOLATOR - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDSCORESWITCHER - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDFILTER - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:MSSTATS - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNCFG - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:SILICOLIBRARYGENERATION - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANN_PRELIMINARY_ANALYSIS - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:ASSEMBLE_EMPIRICAL_LIBRARY - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:INDIVIDUAL_FINAL_ANALYSIS - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNSUMMARY - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:MSSTATS - [- ] process > NFCORE_QUANTMS:QUANTMS:SUMMARYPIPELINE - Execution cancelled -- Finishing pending tasks before exit -[nf-core/quantms] Pipeline completed with errors- ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:TMT:MSSTATSTMT (PXD009920.sdrf_openms_design_msstats_in.csv)'
Caused by:
Process NFCORE_QUANTMS:QUANTMS:TMT:MSSTATSTMT (PXD009920.sdrf_openms_design_msstats_in.csv) terminated with an error exit status (1)
Command executed:
msstats_tmt.R
PXD009920.sdrf_openms_design_msstats_in.csv
"pairwise"
""
true
true
false
sum
msstats
true
true
true
PXD009920.sdrf_openms_design_msstats_in
0.05
false
2>&1 | tee msstats_tmt.log
cat <<-END_VERSIONS > versions.yml "NFCORE_QUANTMS:QUANTMS:TMT:MSSTATSTMT": r-base: $(echo $(R --version 2>&1) | sed 's/^.R version //; s/ .$//') bioconductor-msstatstmt: $(Rscript -e "library(MSstatsTMT); cat(as.character(packageVersion('MSstatsTMT')))") END_VERSIONS
Command exit status: 1
Command output:
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
Error in merge.data.table(input[, colnames(input) != "newABUNDANCE", with = FALSE], :
Elements listed in by must be valid column names in x and y
Calls: proteinSummarization ... .finalizeInput -> .finalizeTMP -> merge -> merge.data.table
In addition: Warning messages:
1: In min(abundance[nonmissing], na.rm = TRUE) :
no non-missing arguments to min; returning Inf
2: In merge.data.table(input[, colnames(input) != "newABUNDANCE", with = FALSE], :
You are trying to join data.tables where 'y' argument is 0 columns data.table.
Execution halted
Command error:
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
Error in merge.data.table(input[, colnames(input) != "newABUNDANCE", with = FALSE], :
Elements listed in by must be valid column names in x and y
Calls: proteinSummarization ... .finalizeInput -> .finalizeTMP -> merge -> merge.data.table
In addition: Warning messages:
1: In min(abundance[nonmissing], na.rm = TRUE) :
no non-missing arguments to min; returning Inf
2: In merge.data.table(input[, colnames(input) != "newABUNDANCE", with = FALSE], :
You are trying to join data.tables where 'y' argument is 0 columns data.table.
Execution halted
Work dir: /Data/nayan/Shortcut_Data/pysradb_downloads/IPX0004838000/PXD009920/work/d6/e21f72636172bd59b52bb898768421
Container: quay.io/biocontainers/bioconductor-msstatstmt:2.10.0--r43hdfd78af_0
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
-- Check '.nextflow.log' file for details ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details `
I think this needs to be included in the validation step of the quant analysis for LFQ and TMT pipelines, in the original step.