--bwa_index: string [database/indexes/GRCm38/BWA/genome.bwt] does not match pattern ^\S+\.\{amb,ann,bwt,pac,sa\}$ (database/indexes/GRCm38/BWA/genome.bwt)
Description of the bug
Hi! I'm using nf-core/circdna with the following configuration:
nextflow run nf-core/circdna \
-r 1.0.4 \
-profile docker \
-resume \
--max_cpus 9 \
--max_memory 21.GB \
--max_time 500.h \
--circle_identifier circle_map_realign,circle_map_repeats,circle_finder,circexplorer2,ampliconarchitect \
--input work/test_mouse/samplesheets/CIRCDNA.csv \
--outdir results/test_mouse/CIRCDNA \
--genome GRCm38 \
--bwa_index database/indexes/GRCm38/BWA/genome.bwt \
--reference_build mm10 \
--mosek_license_dir src/others \
--fasta database/genomes/GRCm38/genome.fasta \
--aa_data_repo database/indexes/GRCm38/aa_data_repo
And it throws this error:
ERROR ~ ERROR: Validation of pipeline parameters failed!
-- Check '.nextflow.log' file for details
ERROR ~ * --bwa_index: string [database/indexes/GRCm38/BWA/genome.bwt] does not match pattern ^\S+\.\{amb,ann,bwt,pac,sa\}$ (database/indexes/GRCm38/BWA/genome.bwt)
I've been searching and I think that the regex pattern might be wrong and instead it is ^\S+\.(amb|ann|bwt|pac|sa)$ (at least according to chatgpt and tested in https://regex101.com/).
Command used and terminal output
No response
Relevant files
No response
System information
No response
I'll have a look. In the meantime could you just run it without the --bwa_index parameter? It will be generated in the pipeline either way. I am looking into removing the parameter all the way to simplify, but this can still be debated.
Hey, I just fixed your bug in a way that makes the user experience better in the future. Now --bwa_index only accepts directory paths. The directory needs to be given that contain all bwa index files.
Is this acceptable for your use ?
Hi! Yes! It works fine. Thanks!
Hi! I've run the pipeline a second time with this config
nextflow run nf-core/circdna \
-r dev \
-profile docker \
-resume \
--max_cpus 9 \
--max_memory 53.GB \
--max_time 500.h \
--circle_identifier circle_map_realign,circle_map_repeats,circle_finder,circexplorer2,ampliconarchitect \
--input work/test_human/samplesheets/CIRCDNA.csv \
--outdir results/test_human/CIRCDNA \
--genome GRCh38 \
--bwa_index database/indexes/GRCh38/BWA \
--input_format FASTQ \
--reference_build GRCh38 \
--mosek_license_dir src/others \
--fasta database/genomes/GRCh38/genome.fasta \
--aa_data_repo $(pwd)/database/indexes/GRCh38/aa_data_repo
And it fails
ERROR ~ Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:BWA_MEM (CDNA_2)'
Caused by:
Process `NFCORE_CIRCDNA:CIRCDNA:BWA_MEM (CDNA_2)` terminated with an error exit status (1)
Command executed:
INDEX=`find -L ./ -name "*.amb" | sed 's/\.amb$//'`
bwa mem \
\
-t 9 \
$INDEX \
CDNA_2.trimmed_1_val_1.fq.gz CDNA_2.trimmed_2_val_2.fq.gz \
| samtools sort --threads 9 -o CDNA_2.bam -
cat <<-END_VERSIONS > versions.yml
"NFCORE_CIRCDNA:CIRCDNA:BWA_MEM":
bwa: $(echo $(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*$//')
samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
[E::bwa_idx_load_from_disk] fail to locate the index files
samtools sort: failed to read header from "-"
Work dir:
/data/Proyectos/NGS_pipeline/work/62/016f2b2f4d03ae096654b3b1b7598f
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
It also failed with --bwa_index database/indexes/GRCh38/BWA/genome; the index are genome.bwt, etc.