quantms icon indicating copy to clipboard operation
quantms copied to clipboard

Error: Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)

Open nayanvs opened this issue 1 month ago • 7 comments

Description of the bug

Despite of having unique combination of Fraction_Group Fraction and Label, at PROTEIN_QUANTIFIER process, the pipeline ends with an error ((Fraction Group, Fraction, Label) combination can only appear once)

Attached are the SDRF.tsv, opens_design.tsv and screenshot of the run.

Image

PXD009920.sdrf.tsv

PXD009920.sdrf_openms_design.tsv

Command used and terminal output

-[bigbio/quantms] Pipeline completed with errors-
ERROR ~ Error executing process > 'BIGBIO_QUANTMS:QUANTMS:TMT:PROTEIN_QUANT:MSSTATS_CONVERTER (PXD009920.sdrf_openms_design.tsv)'

Caused by:
  Process `BIGBIO_QUANTMS:QUANTMS:TMT:PROTEIN_QUANT:MSSTATS_CONVERTER (PXD009920.sdrf_openms_design.tsv)` terminated with an error exit status (8)


Command executed:

  MSstatsConverter \
      -in ID_mapper_merge_epi_filter_resconf.consensusXML \
      -in_design PXD009920.sdrf_openms_design.tsv \
      -method ISO \
      -out PXD009920.sdrf_openms_design_msstats_in.csv \
      -debug 0 \
      2>&1 | tee MSstatsConverter.log
  
  cat <<-END_VERSIONS > versions.yml
  "BIGBIO_QUANTMS:QUANTMS:TMT:PROTEIN_QUANT:MSSTATS_CONVERTER":
      MSstatsConverter: $(MSstatsConverter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
  END_VERSIONS

Command exit status:
  8

Command output:
  Error: Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)

Command error:
  Error: Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)

Work dir:
  /Data/nayan/Shortcut_Data/pysradb_downloads/IPX0004838000/PXD009920/work/c7/f38450e1a602b8265ab8dbf6d4070f

Container:
  ghcr.io/bigbio/openms-tools-thirdparty:2025.04.14

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

No response

nayanvs avatar Nov 18 '25 07:11 nayanvs

Hi!

The error is here:

1 1 Margolis_Mouse_Neuronal_TMT_TP_F1.mzML 8 8
1 1 Margolis_Mouse_Neuronal_TMT_TP_F1.mzML 8 9

jpfeuffer avatar Nov 18 '25 09:11 jpfeuffer

@jpfeuffer Thank you very much! However the issue still persists. Is there any tool that could check SDRF file for these similar mistakes that could be installed? (Thank you again in advance.)

nayanvs avatar Nov 19 '25 05:11 nayanvs

Hi @nayanvs , there is a duplicate label value for this file in your SDRF. It should be unique for data file combined with label. Please change TMT130N to TMT130C It seem appears we are currently adding this unique validation logic. However, this version has not yet been released. https://github.com/bigbio/sdrf-pipelines/blob/7b7dd367ba7ffe105356e9ba0c1f29348d3169e1/sdrf_pipelines/sdrf/validators.py#L76

Image

daichengxin avatar Nov 19 '25 05:11 daichengxin

@jpfeuffer @daichengxin Thank you for your feedback!

Hi, I have resolved this but the issue still persist. See the updated files below: By the way are your using any tool to detect these errors?

PXD009920.sdrf.tsv

PXD009920.sdrf_openms_design.tsv

nayanvs avatar Nov 19 '25 08:11 nayanvs

you could use this script to check duplicate rows @nayanvs :

sdrf = pd.read_csv("PXD009920.sdrf.tsv", sep="\t")
dups = sdrf[sdrf.duplicated(subset=["comment[data file]", "comment[label]"], keep=False)]
print(dups)

daichengxin avatar Nov 21 '25 08:11 daichengxin

The last files look fine to me. Did you make sure there are no leftovers of the old file cached somewhere? I am not sure how this behaves during a nextflow resume.

Please raise an issue or even better create a PR at the sdrf repository. It should be checked in the validator. As @daichengxin mentioned maybe this is checked already but the version of the validator we use in quantms is outdated.

@ypriverol do you know more?

jpfeuffer avatar Nov 21 '25 09:11 jpfeuffer

@nayanvs did you manage to solve the SDRF issue?

ypriverol avatar Nov 28 '25 06:11 ypriverol

Thank you for your input. I was able to resolve the issue by deduplicating the labels. I did encounter another error afterward, but it appears to be unrelated.

`[d6/e21f72] process > NFCORE_QUANTMS:QUANTMS:TMT:MSSTATSTMT (PXD009920.sdrf_openms_design_msstats_in.csv) [ 0%] 0 of 1 ✘ [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:DATABASESEARCHENGINES:SEARCHENGINECOMET - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:EXTRACTPSMFEATURES - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:PERCOLATOR - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDSCORESWITCHER - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDFILTER - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ - [- ] process > NFCORE_QUANTMS:QUANTMS:LFQ:MSSTATS - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNCFG - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:SILICOLIBRARYGENERATION - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANN_PRELIMINARY_ANALYSIS - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:ASSEMBLE_EMPIRICAL_LIBRARY - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:INDIVIDUAL_FINAL_ANALYSIS - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNSUMMARY - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT - [- ] process > NFCORE_QUANTMS:QUANTMS:DIA:MSSTATS - [- ] process > NFCORE_QUANTMS:QUANTMS:SUMMARYPIPELINE - Execution cancelled -- Finishing pending tasks before exit -[nf-core/quantms] Pipeline completed with errors- ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:TMT:MSSTATSTMT (PXD009920.sdrf_openms_design_msstats_in.csv)'

Caused by: Process NFCORE_QUANTMS:QUANTMS:TMT:MSSTATSTMT (PXD009920.sdrf_openms_design_msstats_in.csv) terminated with an error exit status (1)

Command executed:

msstats_tmt.R
PXD009920.sdrf_openms_design_msstats_in.csv
"pairwise"
""
true
true
false
sum
msstats
true
true
true
PXD009920.sdrf_openms_design_msstats_in
0.05
false

2>&1 | tee msstats_tmt.log

cat <<-END_VERSIONS > versions.yml "NFCORE_QUANTMS:QUANTMS:TMT:MSSTATSTMT": r-base: $(echo $(R --version 2>&1) | sed 's/^.R version //; s/ .$//') bioconductor-msstatstmt: $(Rscript -e "library(MSstatsTMT); cat(as.character(packageVersion('MSstatsTMT')))") END_VERSIONS

Command exit status: 1

Command output: |============================================================= | 87% |
|============================================================= | 88% |
|============================================================== | 88% |
|============================================================== | 89% |
|=============================================================== | 89% |
|=============================================================== | 90% |
|=============================================================== | 91% |
|================================================================ | 91% |
|================================================================ | 92% |
|================================================================= | 92% |
|================================================================= | 93% |
|================================================================== | 94% |
|================================================================== | 95% |
|=================================================================== | 95% |
|=================================================================== | 96% |
|==================================================================== | 97% |
|==================================================================== | 98% |
|===================================================================== | 98% |
|===================================================================== | 99% |
|======================================================================| 99% |
|======================================================================| 100% Error in merge.data.table(input[, colnames(input) != "newABUNDANCE", with = FALSE], : Elements listed in by must be valid column names in x and y Calls: proteinSummarization ... .finalizeInput -> .finalizeTMP -> merge -> merge.data.table In addition: Warning messages: 1: In min(abundance[nonmissing], na.rm = TRUE) : no non-missing arguments to min; returning Inf 2: In merge.data.table(input[, colnames(input) != "newABUNDANCE", with = FALSE], : You are trying to join data.tables where 'y' argument is 0 columns data.table. Execution halted

Command error: |============================================================= | 87% |
|============================================================= | 88% |
|============================================================== | 88% |
|============================================================== | 89% |
|=============================================================== | 89% |
|=============================================================== | 90% |
|=============================================================== | 91% |
|================================================================ | 91% |
|================================================================ | 92% |
|================================================================= | 92% |
|================================================================= | 93% |
|================================================================== | 94% |
|================================================================== | 95% |
|=================================================================== | 95% |
|=================================================================== | 96% |
|==================================================================== | 97% |
|==================================================================== | 98% |
|===================================================================== | 98% |
|===================================================================== | 99% |
|======================================================================| 99% |
|======================================================================| 100% Error in merge.data.table(input[, colnames(input) != "newABUNDANCE", with = FALSE], : Elements listed in by must be valid column names in x and y Calls: proteinSummarization ... .finalizeInput -> .finalizeTMP -> merge -> merge.data.table In addition: Warning messages: 1: In min(abundance[nonmissing], na.rm = TRUE) : no non-missing arguments to min; returning Inf 2: In merge.data.table(input[, colnames(input) != "newABUNDANCE", with = FALSE], : You are trying to join data.tables where 'y' argument is 0 columns data.table. Execution halted

Work dir: /Data/nayan/Shortcut_Data/pysradb_downloads/IPX0004838000/PXD009920/work/d6/e21f72636172bd59b52bb898768421

Container: quay.io/biocontainers/bioconductor-msstatstmt:2.10.0--r43hdfd78af_0

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

-- Check '.nextflow.log' file for details `

nayanvs avatar Dec 02 '25 23:12 nayanvs

I think this needs to be included in the validation step of the quant analysis for LFQ and TMT pipelines, in the original step.

ypriverol avatar Dec 03 '25 10:12 ypriverol