quantms icon indicating copy to clipboard operation
quantms copied to clipboard

bug in quantms related with parquet and SDRF-pipelines

Open ypriverol opened this issue 4 months ago • 1 comments

Description of the bug

Plus 28 more processes waiting for tasks… Execution cancelled -- Finishing pending tasks before exit -[nf-core/quantms] Pipeline completed with errors- ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:INPUT_CHECK:SAMPLESHEET_CHECK (PXD000001.sdrf.tsv)'

Caused by: Process NFCORE_QUANTMS:QUANTMS:INPUT_CHECK:SAMPLESHEET_CHECK (PXD000001.sdrf.tsv) terminated with an error exit status (1)

Command executed:

quantmsutilsc checksamplesheet --exp_design "PXD000001.sdrf.tsv" --is_sdrf


--skip_factor_validation

--use_ols_cache_only 2>&1 | tee input_check.log

cat <<-END_VERSIONS > versions.yml "NFCORE_QUANTMS:QUANTMS:INPUT_CHECK:SAMPLESHEET_CHECK": quantms-utils: $(pip show quantms-utils | grep "Version" | awk -F ': ' '{print $2}') END_VERSIONS

Command exit status: 1

Command output: 2024-10-06 11:26:49,019 [] - platform is linux 2024-10-06 11:26:49,071 [wrapper] - CACHEDIR=/tmp/matplotlib-1e0a9r_x 2024-10-06 11:26:49,071 [init] - font search path [PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/ttf'), PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/afm'), PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts')] Fontconfig error: No writable cache directories 2024-10-06 11:26:49,348 [_load_fontmanager] - generated new fontManager Traceback (most recent call last): File "/usr/local/bin/quantmsutilsc", line 10, in sys.exit(main()) File "/usr/local/lib/python3.10/site-packages/quantmsutils/quantmsutilsc.py", line 38, in main cli() File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/quantmsutils/sdrf/check_samplesheet.py", line 189, in checksamplesheet check_sdrf( File "/usr/local/lib/python3.10/site-packages/quantmsutils/sdrf/check_samplesheet.py", line 55, in check_sdrf errors = df.validate(DEFAULT_TEMPLATE, use_ols_cache_only) File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf.py", line 79, in validate errors = default_schema.validate(self, use_ols_cache_only=use_ols_cache_only) File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 218, in validate error_ontology_terms = self.validate_columns(panda_sdrf, use_ols_cache_only=use_ols_cache_only) File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 329, in validate_columns errors += column.validate(series) File "/usr/local/lib/python3.10/site-packages/pandas_schema/column.py", line 27, in validate return [error for validation in self.validations for error in validation.get_errors(series, self)] File "/usr/local/lib/python3.10/site-packages/pandas_schema/column.py", line 27, in return [error for validation in self.validations for error in validation.get_errors(series, self)] File "/usr/local/lib/python3.10/site-packages/pandas_schema/validation.py", line 85, in get_errors simple_validation = ~self.validate(series) File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 149, in validate ontology_terms = client.search( File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/ols.py", line 286, in search terms = self.cache_search(term, ontology) File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/ols.py", line 414, in cache_search duckdb_conn = duckdb.execute( File "/usr/local/lib/python3.10/site-packages/duckdb/init.py", line 225, in execute return conn.execute(query, parameters, multiple_parameter_sets, **kwargs) duckdb.duckdb.ConversionException: Conversion Error: In Parquet reader of file "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/psi-ms.parquet": failed to cast column "accession" from type VARCHAR to INTEGER: Could not convert string 'NCIT:C25330' to INT32

In file "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/psi-ms.parquet" the column "accession" has type VARCHAR, but we are trying to read it as type INTEGER. This can happen when reading multiple Parquet files. The schema information is taken from the first Parquet file by default. Possible solutions:

  • Enable the union_by_name=True option to combine the schema of all Parquet files (duckdb.org/docs/data/multiple_files/combining_schemas)
  • Use a COPY statement to automatically derive types from an existing table.

Command error: 2024-10-06 11:26:49,019 [] - platform is linux 2024-10-06 11:26:49,071 [wrapper] - CACHEDIR=/tmp/matplotlib-1e0a9r_x 2024-10-06 11:26:49,071 [init] - font search path [PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/ttf'), PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/afm'), PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts')] Fontconfig error: No writable cache directories 2024-10-06 11:26:49,348 [_load_fontmanager] - generated new fontManager Traceback (most recent call last): File "/usr/local/bin/quantmsutilsc", line 10, in sys.exit(main()) File "/usr/local/lib/python3.10/site-packages/quantmsutils/quantmsutilsc.py", line 38, in main cli() File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/quantmsutils/sdrf/check_samplesheet.py", line 189, in checksamplesheet check_sdrf( File "/usr/local/lib/python3.10/site-packages/quantmsutils/sdrf/check_samplesheet.py", line 55, in check_sdrf errors = df.validate(DEFAULT_TEMPLATE, use_ols_cache_only) File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf.py", line 79, in validate errors = default_schema.validate(self, use_ols_cache_only=use_ols_cache_only) File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 218, in validate error_ontology_terms = self.validate_columns(panda_sdrf, use_ols_cache_only=use_ols_cache_only) File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 329, in validate_columns errors += column.validate(series) File "/usr/local/lib/python3.10/site-packages/pandas_schema/column.py", line 27, in validate return [error for validation in self.validations for error in validation.get_errors(series, self)] File "/usr/local/lib/python3.10/site-packages/pandas_schema/column.py", line 27, in return [error for validation in self.validations for error in validation.get_errors(series, self)] File "/usr/local/lib/python3.10/site-packages/pandas_schema/validation.py", line 85, in get_errors simple_validation = ~self.validate(series) File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 149, in validate ontology_terms = client.search( File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/ols.py", line 286, in search terms = self.cache_search(term, ontology) File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/ols.py", line 414, in cache_search duckdb_conn = duckdb.execute( File "/usr/local/lib/python3.10/site-packages/duckdb/init.py", line 225, in execute return conn.execute(query, parameters, multiple_parameter_sets, **kwargs) duckdb.duckdb.ConversionException: Conversion Error: In Parquet reader of file "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/psi-ms.parquet": failed to cast column "accession" from type VARCHAR to INTEGER: Could not convert string 'NCIT:C25330' to INT32

In file "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/psi-ms.parquet" the column "accession" has type VARCHAR, but we are trying to read it as type INTEGER. This can happen when reading multiple Parquet files. The schema information is taken from the first Parquet file by default. Possible solutions:

  • Enable the union_by_name=True option to combine the schema of all Parquet files (duckdb.org/docs/data/multiple_files/combining_schemas)
  • Use a COPY statement to automatically derive types from an existing table.

Work dir: /Users/yperez/work/quantms/work/ce/9f2986f1eb38825f0d7f4a75cae3d4

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

-- Check '.nextflow.log' file for details

Command used and terminal output

No response

Relevant files

No response

System information

No response

ypriverol avatar Oct 06 '24 12:10 ypriverol