circdna --bwa_index: string [database/indexes/GRCm38/BWA/genome.bwt] does not match pattern ^\S+\.\{amb,ann,bwt,pac,sa\}$ (database/indexes/GRCm38/BWA/genome.bwt)

Description of the bug

Hi! I'm using nf-core/circdna with the following configuration:

nextflow run nf-core/circdna \
-r 1.0.4 \
-profile docker \
-resume \
--max_cpus 9 \
--max_memory 21.GB \
--max_time 500.h \
--circle_identifier circle_map_realign,circle_map_repeats,circle_finder,circexplorer2,ampliconarchitect \
--input work/test_mouse/samplesheets/CIRCDNA.csv \
--outdir results/test_mouse/CIRCDNA \
--genome GRCm38 \
--bwa_index database/indexes/GRCm38/BWA/genome.bwt \
--reference_build mm10 \
--mosek_license_dir src/others \
--fasta database/genomes/GRCm38/genome.fasta \
--aa_data_repo database/indexes/GRCm38/aa_data_repo

And it throws this error:

ERROR ~ ERROR: Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
ERROR ~ * --bwa_index: string [database/indexes/GRCm38/BWA/genome.bwt] does not match pattern ^\S+\.\{amb,ann,bwt,pac,sa\}$ (database/indexes/GRCm38/BWA/genome.bwt)

I've been searching and I think that the regex pattern might be wrong and instead it is ^\S+\.(amb|ann|bwt|pac|sa)$ (at least according to chatgpt and tested in https://regex101.com/).

Command used and terminal output

No response

Relevant files

No response

System information

No response

Jan 31 '24 10:01 alexmascension

I'll have a look. In the meantime could you just run it without the --bwa_index parameter? It will be generated in the pipeline either way. I am looking into removing the parameter all the way to simplify, but this can still be debated.

Feb 03 '24 07:02 DSchreyer

Hey, I just fixed your bug in a way that makes the user experience better in the future. Now --bwa_index only accepts directory paths. The directory needs to be given that contain all bwa index files.

Is this acceptable for your use ?

Feb 04 '24 11:02 DSchreyer

Hi! Yes! It works fine. Thanks!

Feb 05 '24 13:02 alexmascension

Hi! I've run the pipeline a second time with this config

nextflow run nf-core/circdna \
-r dev \
-profile docker \
-resume \
--max_cpus 9 \
--max_memory 53.GB \
--max_time 500.h \
--circle_identifier circle_map_realign,circle_map_repeats,circle_finder,circexplorer2,ampliconarchitect \
--input work/test_human/samplesheets/CIRCDNA.csv \
--outdir results/test_human/CIRCDNA \
--genome GRCh38 \
--bwa_index database/indexes/GRCh38/BWA \
--input_format FASTQ \
--reference_build GRCh38 \
--mosek_license_dir src/others \
--fasta database/genomes/GRCh38/genome.fasta \
--aa_data_repo $(pwd)/database/indexes/GRCh38/aa_data_repo

And it fails

ERROR ~ Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:BWA_MEM (CDNA_2)'

Caused by:
  Process `NFCORE_CIRCDNA:CIRCDNA:BWA_MEM (CDNA_2)` terminated with an error exit status (1)

Command executed:

  INDEX=`find -L ./ -name "*.amb" | sed 's/\.amb$//'`
  
  bwa mem \
       \
      -t 9 \
      $INDEX \
      CDNA_2.trimmed_1_val_1.fq.gz CDNA_2.trimmed_2_val_2.fq.gz \
      | samtools sort  --threads 9 -o CDNA_2.bam -
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:BWA_MEM":
      bwa: $(echo $(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*$//')
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  [E::bwa_idx_load_from_disk] fail to locate the index files
  samtools sort: failed to read header from "-"

Work dir:
  /data/Proyectos/NGS_pipeline/work/62/016f2b2f4d03ae096654b3b1b7598f

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

It also failed with --bwa_index database/indexes/GRCh38/BWA/genome; the index are genome.bwt, etc.

Feb 07 '24 16:02 alexmascension