atacseq icon indicating copy to clipboard operation
atacseq copied to clipboard

BWA_MEM fails when params.fasta is a glob

Open chbk opened this issue 3 years ago • 0 comments

Bug report for an edge case that I encountered. When the fasta option is given a glob, for example --fasta 'genome/*', the bwa_base variable is set to * at line 186. This seems to lead to the failure of the BWA_MEM process:

Error executing process > 'BWA_MEM (liver_R3_T1)'

Caused by:
  Process `BWA_MEM (liver_R3_T1)` terminated with an error exit status (1)

Command executed:

  bwa mem \
      -t 12 \
      -M \
      -R '@RG\tID:liver_R3_T1\tSM:liver_R3\tPL:ILLUMINA\tLB:liver_R3_T1\tPU:1' \
       \
      BWAIndex/* \
      liver_R3_T1_1_val_1.fq.gz liver_R3_T1_2_val_2.fq.gz \
      | samtools view -@ 12 -b -h -F 0x0100 -O BAM -o liver_R3_T1.Lb.bam -

Command exit status:
  1

Command output:
  (empty)

Command error:
         -y INT        seed occurrence for the 3rd round seeding [20]
         -c INT        skip seeds with more than INT occurrences [500]
         -D FLOAT      drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50]
         -W INT        discard a chain if seeded bases shorter than INT [0]
         -m INT        perform at most INT rounds of mate rescues for each read [50]
         -S            skip mate rescue
         -P            skip pairing; mate rescue performed unless -S also in use
  
  Scoring options:
  
         -A INT        score for a sequence match, which scales options -TdBOELU unless overridden [1]
         -B INT        penalty for a mismatch [4]
         -O INT[,INT]  gap open penalties for deletions and insertions [6,6]
         -E INT[,INT]  gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1]
         -L INT[,INT]  penalty for 5'- and 3'-end clipping [5,5]
         -U INT        penalty for an unpaired read pair [17]
  
         -x STR        read type. Setting -x changes multiple parameters unless overridden [null]
                       pacbio: -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0  (PacBio reads to ref)
                       ont2d: -k14 -W20 -r10 -A1 -B1 -O1 -E1 -L0  (Oxford Nanopore 2D-reads to ref)
                       intractg: -B9 -O16 -L5  (intra-species contigs to ref)
  
  Input/output options:
  
         -p            smart pairing (ignoring in2.fq)
         -R STR        read group header line such as '@RG\tID:foo\tSM:bar' [null]
         -H STR/FILE   insert STR to header if it starts with @; or insert lines in FILE [null]
         -o FILE       sam file to output results to [stdout]
         -j            treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)
         -5            for split alignment, take the alignment with the smallest coordinate as primary
         -q            don't modify mapQ of supplementary alignments
         -K INT        process INT input bases in each batch regardless of nThreads (for reproducibility) []
  
         -v INT        verbosity level: 1=error, 2=warning, 3=message, 4+=debugging [3]
         -T INT        minimum score to output [30]
         -h INT[,INT]  if there are <INT hits with score >80% of the max score, output all in XA [5,200]
         -a            output all alignments for SE or unpaired PE
         -C            append FASTA/FASTQ comment to SAM output
         -V            output the reference FASTA header in the XR tag
         -Y            use soft clipping for supplementary alignments
         -M            mark shorter split hits as secondary
  
         -I FLOAT[,FLOAT[,INT[,INT]]]
                       specify the mean, standard deviation (10% of the mean if absent), max
                       (4 sigma from the mean if absent) and min of the insert size distribution.
                       FR orientation only. [inferred]
  
  Note: Please read the man page for detailed description of the command line and options.
  
  [main_samview] fail to read the header from "-".
  • Nextflow version 20.10.0
  • nf-core/atacseq v1.2.1

chbk avatar Jan 26 '21 12:01 chbk