cutandrun Version 3.2 does not run, fails in frag histogram step

Description of the bug

Same arguments as with version 3.1 and that works, but the 3.2 fails with following error:

frag_hist["group"] = group_short NameError: name 'group_short' is not defined

Command used and terminal output

nextflow run nf-core/cutandrun \
    -r 3.2  \
    -profile singularity \
    --input /scratch/project_2001539/Sami/nfcore_cutandrun/sample_sheets/sample_sheet.040723.csv \
    --outdir /scratch/project_2001539/Sami/nfcore_cutandrun/outdir/out_210923 \
    --genome mm10 \
    --blacklist ~/metadata/mm10-blacklist.v2.bed \
    --peakcaller seacr, MACS2 \
    --normalisation_mode CPM \
    --bowtie2 /scratch/project_2004451/refgenomes/alias/mm10/bowtie2_index/default/ \
    --minimum_alignment_q_score 30 \
    --replicate_threshold 2 \

Error:
frag_hist["group"] = group_short
  NameError: name 'group_short' is not defined

Relevant files

No response

System information

N E X T F L O W ~ version 23.04.3, SLurm, HPC

Sep 25 '23 14:09 skilpinen

Just echo-ing this -- I got the exact same error running the nf-core/cutandrun v3.2-g506a325 version of the pipeline on Nextflow 23.04.3 in Slurm/HPC setting.

Oct 10 '23 14:10 batzza

Hi, I am just about to release a patch for 3.2.1, could you try when released?

Oct 22 '23 10:10 chris-cheshire

Sure, I actually just had some C&R data come off the sequencer on Friday so I planned to run the pipeline this week anyways. I'll follow up in the next few days.

Oct 22 '23 23:10 batzza

I came up with

--- a/bin/calc_frag_hist.py
+++ b/bin/calc_frag_hist.py
@@ -57,6 +57,9 @@ print("Calclate fragment histogram")
 frag_path = os.path.abspath(args.frag_path)
 frag_hist = None
 
+group_short = None
+rep_short = None
+
 # Create list of deeptools raw fragment files
 dt_frag_list = glob.glob(frag_path)
 dt_frag_list.sort()

but happily receive other instructions to test.

Oct 25 '23 08:10 smoe

After addressing variable visibility, the error changes to

ERROR ~ Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:FRAG_LEN_HIST'

Caused by:
  Process `NFCORE_CUTANDRUN:CUTANDRUN:FRAG_LEN_HIST` terminated with an error exit status (1)

Command executed:

  calc_frag_hist.py \
      --frag_path "*len.txt" \
      --output frag_len_hist.txt
  
  if [ -f "frag_len_hist.txt" ]; then
      cat frag_len_header.txt frag_len_hist.txt > frag_len_mqc.yml
  fi
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CUTANDRUN:CUTANDRUN:FRAG_LEN_HIST":
      python: $(python --version | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}")
      numpy: $(python -c 'import numpy; print(numpy.__version__)')
      pandas: $(python -c 'import pandas; print(pandas.__version__)')
      seaborn: $(python -c 'import seaborn; print(seaborn.__version__)')
  END_VERSIONS

Command exit status:
  1

Command output:
  Calclate fragment histogram

Command error:
  Calclate fragment histogram
  Traceback (most recent call last):
    File "/home/sm718/.nextflow/assets/nf-core/cutandrun/bin/calc_frag_hist.py", line 117, in <module>
      frag_hist["group"] = group_short
  TypeError: 'NoneType' object does not support item assignment

Oct 25 '23 13:10 smoe

No one else is reporting these errors, this is strange!

Oct 27 '23 07:10 chris-cheshire

I just send batch job to our cluster with v3.2.1, let's see what happens here.

Oct 27 '23 07:10 skilpinen

Cannot find revision 3.2.1 -- Make sure that it exists in the remote repository https://github.com/nf-core/cutandrun

Can I call this with explicit -r 3.2.1? Or is it still just 3.2?

Oct 27 '23 07:10 skilpinen

3.2.1 is out. nextflow pull cutandrun just performed the update for me. Still running :-/

My initial reflex was that this could be a Python-version issue wrt variable scoping when these are implicitly defined in the loop - but I tested this and this is not the case. That said, Python is 3.7 for me.

From how I interpret this, this only makes sense if the loop is never entered, i.e. if list(range(len(dt_frag_list))) is empty.

Does that make any sense?

Oct 27 '23 13:10 smoe

I have rerun 3.2.1 and it first failed with the same error, then added

--- a/bin/calc_frag_hist.py
+++ b/bin/calc_frag_hist.py
@@ -25,6 +25,7 @@
 # Author: @chris-cheshire
 
 import os
+import sys
 import glob
 import argparse
 
@@ -61,6 +62,8 @@ frag_hist = None
 dt_frag_list = glob.glob(frag_path)
 dt_frag_list.sort()
 
+print("I: Iterating over "+str(len(dt_frag_list))+" files with fragments from derived from glob search '"+frag_path+"' - this number should not be 0, the argument was '"+args.frag_path+"', all executed in '"+os.getcwd()+"'")
+
 for i in list(range(len(dt_frag_list))):
     # Create dataframe from csv file for each file and save to a list of data frames
     dt_frag_i = pd.read_csv(dt_frag_list[i], sep="\t", header=None, names=["Size", "Occurrences"])

and "-resume"-ed , corrected Python errors in my debug line :-) and resumed again. And then it worked. I found that the peak lists were not all cached when resumed - some files had still to be run. Would that be a potential explanation for the error? For a clearer error message, would you consider adding an extra debug message or even an early exit when the semantics are disturbed, like a glob not returning any results? Would you prefer me preparing a PR for that?

Oct 30 '23 23:10 smoe

cutandrun cutandrun copied to clipboard

Version 3.2 does not run, fails in frag histogram step

Description of the bug

Command used and terminal output

Relevant files

System information

cutandrun
cutandrun copied to clipboard