bacass icon indicating copy to clipboard operation
bacass copied to clipboard

Execution failed when --kmerfinder_db and --ncbi_assembly_metadata are provided

Open IBEXCluster opened this issue 1 year ago • 5 comments

Dear developers, We are trying to use additional parameters --kmerfinder_db and --ncbi_assembly_metadata for bacass workflow with release 2.4.0 or development version. Both are failed with the following error:

WARN: The following invalid input values have been detected:
 
* --kmerfinder_db: DATABASES/bacteria.tar.gz
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Kmerfinder database and NCBI assembly metadata not provided.

  Please specify the '--kmerfinderdb' and '--ncbi_assembly_metadata' parameters.

  Both are required to run Kmerfinder.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Any advise?

Here is the run script:

rm -rf ~/.nextflow/assets/nf-core/bacass ;

nextflow run nf-core/bacass -r dev -c nextflow.config -profile singularity --input samplesheet.tsv --kraken2db /ibex/ai/reference/KSL/kraken2/kraken2_dbs/scripts_download/k2_nt_20230502.tar.gz --kmerfinder_db DATABASES/bacteria.tar.gz --ncbi_assembly_metadata ASSEMBLY-REPORTS/assembly_summary_refseq.txt --outdir outputs/2024-12-04_15-KSA-samples__local-downloads__dev

Here is the complete log file:

Nextflow 24.10.2 is available - Please consider updating your version to it

 

N E X T F L O W   ~  version 24.04.4

 

Pulling nf-core/bacass ...

downloaded from [https://github.com/nf-core/bacass.git](https://urldefense.com/v3/__https://github.com/nf-core/bacass.git__;!!Nmw4Hv0!0mP0MQIUjAAB-QNxI3mFS2Dk1FPivNLfEt_5nSL8E1w47QotyFxumO7yk5_kK57CL3WLftfSsoxw9gHJmOmCgncD09XgPC-mszOd6Vy7Ht0$)

Launching `[https://github.com/nf-core/bacass](https://urldefense.com/v3/__https://github.com/nf-core/bacass__;!!Nmw4Hv0!0mP0MQIUjAAB-QNxI3mFS2Dk1FPivNLfEt_5nSL8E1w47QotyFxumO7yk5_kK57CL3WLftfSsoxw9gHJmOmCgncD09XgPC-mszOdQdoeFKY$)` [thirsty_bhabha] DSL2 - revision: ad892edcdb [dev]

 

 

------------------------------------------------------

                                        ,--./,-.

        ___     __   __   __   ___     /,-._.--~'

  |\ | |__  __ /  ` /  \ |__) |__         }  {

  | \| |       \__, \__/ |  \ |___     \`-._,-`-,

                                        `._,._,'

  nf-core/bacass 2.5.0dev

------------------------------------------------------

Input/output options

  input                 : samplesheet.tsv

  outdir                : outputs/2024-12-04_15-KSA-samples__local-downloads__dev

 

Contamination Screening

  kraken2db             : /ibex/ai/reference/KSL/kraken2/kraken2_dbs/scripts_download/k2_nt_20230502.tar.gz

  ncbi_assembly_metadata: ASSEMBLY-REPORTS/assembly_summary_refseq.txt

 

Assembly parameters

  canu_mode             : -nanopore

 

Annotation

  dfast_config          : /home/pampum/.nextflow/assets/nf-core/bacass/assets/test_config_dfast.py

 

Core Nextflow options

  revision              : dev

  runName               : thirsty_bhabha

  containerEngine       : singularity

  launchDir             : /ibex/user/pampum/2024-11-26_KSA-lib-ONT-assemblies

  workDir               : /ibex/user/pampum/2024-11-26_KSA-lib-ONT-assemblies/work

  projectDir            : /home/pampum/.nextflow/assets/nf-core/bacass

  userName              : pampum

  profile               : singularity

  configFiles           :

 

!! Only displaying parameters that differ from the pipeline defaults !!

------------------------------------------------------* The pipeline

  [https://doi.org/10.5281/zenodo.2669428](https://urldefense.com/v3/__https://doi.org/10.5281/zenodo.2669428__;!!Nmw4Hv0!0mP0MQIUjAAB-QNxI3mFS2Dk1FPivNLfEt_5nSL8E1w47QotyFxumO7yk5_kK57CL3WLftfSsoxw9gHJmOmCgncD09XgPC-mszOdfnSEiLo$)

 

* The nf-core framework

    [https://doi.org/10.1038/s41587-020-0439-x](https://urldefense.com/v3/__https://doi.org/10.1038/s41587-020-0439-x__;!!Nmw4Hv0!0mP0MQIUjAAB-QNxI3mFS2Dk1FPivNLfEt_5nSL8E1w47QotyFxumO7yk5_kK57CL3WLftfSsoxw9gHJmOmCgncD09XgPC-mszOdNw1r3WM$)

 

* Software dependencies

    [https://github.com/nf-core/bacass/blob/master/CITATIONS.md](https://urldefense.com/v3/__https://github.com/nf-core/bacass/blob/master/CITATIONS.md__;!!Nmw4Hv0!0mP0MQIUjAAB-QNxI3mFS2Dk1FPivNLfEt_5nSL8E1w47QotyFxumO7yk5_kK57CL3WLftfSsoxw9gHJmOmCgncD09XgPC-mszOdJfshqHM$)

 

WARN: The following invalid input values have been detected:

 

* --kmerfinder_db: DATABASES/bacteria.tar.gz

 

 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Kmerfinder database and NCBI assembly metadata not provided.

  Please specify the '--kmerfinderdb' and '--ncbi_assembly_metadata' parameters.

  Both are required to run Kmerfinder.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We may not possible to use custom parameters. Can you please help us to fix this error?

IBEXCluster avatar Dec 05 '24 08:12 IBEXCluster

  • try using absolute paths for the kmer db and ncbi metadata input
  • possibly similar to issue #187

m-jahn avatar Dec 10 '24 10:12 m-jahn

Dear @m-jahn Thanks for your recommendations. I was downloaded the Kmerfinder database from https://zenodo.org/records/13447056. However, this kmerfinder job step was failed to locate bacteria.tax and it's not part of the distribution.

i.e.,

Command executed:

  kmerfinder.py \
      --infile SRR10093029_1.fastp.fastq.gz SRR10093029_2.fastp.fastq.gz \
      --output_folder . \
      --db_path 20190108_kmerfinder_stable_dirs/bacteria.ATG \
      -tax 20190108_kmerfinder_stable_dirs/bacteria.tax \
      -x
  
  mv results.txt SRR10093029_results.txt
  mv data.json SRR10093029_data.json
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_BACASS:BACASS:KMERFINDER_SUBWORKFLOW:KMERFINDER":
      kmerfinder: $(echo "3.0.2")
  END_VERSIONS

Command exit status:
  1

Command output:
  # Time used to run KMA for species identifation: 0.016 s
  Cant open file: [Errno 2] No such file or directory: '20190108_kmerfinder_stable_dirs/bacteria.tax'

As you may refer this Kmerfinder database (2019/01/08 - 17GB) - stable dir, which may not have bacteria.tax.

Version: 20190108_stable_dirs
Website: ftp://ftp.cbs.dtu.dk/public/CGE/databases/KmerFinder/version/

Content 20190108_stable_dirs.tar.gz:

bacteria
├── bacteria.ATG.comp.b
├── bacteria.ATG.length.bp
├── bacteria.ATG.name
├── bacteria.ATG.seq.b
└── bacteria.name

Update: Same KmerFinder version, but the previous database was corrupted and resulted in untar errors. This version should fix that.

Any further suggestions? Thanks in advance.

IBEXCluster avatar Dec 10 '24 14:12 IBEXCluster

In your work directory for this module, rename bacteria.name to bacteria.tax. Then it will work. This is due to a change in the kmerfinder database structure. Strange enough, the bacass pipe should actually work with both as this module looks for both .name and .tax ending, but it doesn't.

m-jahn avatar Dec 10 '24 14:12 m-jahn

More specifically, this line in modules/local/kmerfinder/main.nf looks for both file endings:

def db_tax = file("${kmerfinderdb_path}/${tax_group}.name").exists() ? "${kmerfinderdb_path}/${tax_group}.name" : "${kmerfinderdb_path}/${tax_group}.tax"

I can not explain why it doesn't accept the .name file.

m-jahn avatar Dec 10 '24 14:12 m-jahn

Many thanks @m-jahn for your recommendations. I noticed, the taxonomy file was unavailable at ftp://ftp.cbs.dtu.dk/public/CGE/databases/KmerFinder/version/. However, I was trying to install the required database fromKmerFinder : bash ~/kmerfinder/src/kmerfinder_db/INSTALL.sh $PWD bacteria latest and it's helped to create a bacteria taxonomy file bacteria.tax. Thanks for your suggestions again!

IBEXCluster avatar Dec 11 '24 05:12 IBEXCluster