cutandrun icon indicating copy to clipboard operation
cutandrun copied to clipboard

Version 3.2 does not run, fails in frag histogram step

Open skilpinen opened this issue 1 year ago • 10 comments

Description of the bug

Same arguments as with version 3.1 and that works, but the 3.2 fails with following error:

frag_hist["group"] = group_short NameError: name 'group_short' is not defined

Command used and terminal output

nextflow run nf-core/cutandrun \
    -r 3.2  \
    -profile singularity \
    --input /scratch/project_2001539/Sami/nfcore_cutandrun/sample_sheets/sample_sheet.040723.csv \
    --outdir /scratch/project_2001539/Sami/nfcore_cutandrun/outdir/out_210923 \
    --genome mm10 \
    --blacklist ~/metadata/mm10-blacklist.v2.bed \
    --peakcaller seacr, MACS2 \
    --normalisation_mode CPM \
    --bowtie2 /scratch/project_2004451/refgenomes/alias/mm10/bowtie2_index/default/ \
    --minimum_alignment_q_score 30 \
    --replicate_threshold 2 \

Error:
frag_hist["group"] = group_short
  NameError: name 'group_short' is not defined

Relevant files

No response

System information

N E X T F L O W ~ version 23.04.3, SLurm, HPC

skilpinen avatar Sep 25 '23 14:09 skilpinen

Just echo-ing this -- I got the exact same error running the nf-core/cutandrun v3.2-g506a325 version of the pipeline on Nextflow 23.04.3 in Slurm/HPC setting.

batzza avatar Oct 10 '23 14:10 batzza

Hi, I am just about to release a patch for 3.2.1, could you try when released?

chris-cheshire avatar Oct 22 '23 10:10 chris-cheshire

Sure, I actually just had some C&R data come off the sequencer on Friday so I planned to run the pipeline this week anyways. I'll follow up in the next few days.

batzza avatar Oct 22 '23 23:10 batzza

I came up with

--- a/bin/calc_frag_hist.py
+++ b/bin/calc_frag_hist.py
@@ -57,6 +57,9 @@ print("Calclate fragment histogram")
 frag_path = os.path.abspath(args.frag_path)
 frag_hist = None
 
+group_short = None
+rep_short = None
+
 # Create list of deeptools raw fragment files
 dt_frag_list = glob.glob(frag_path)
 dt_frag_list.sort()

but happily receive other instructions to test.

smoe avatar Oct 25 '23 08:10 smoe

After addressing variable visibility, the error changes to

ERROR ~ Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:FRAG_LEN_HIST'

Caused by:
  Process `NFCORE_CUTANDRUN:CUTANDRUN:FRAG_LEN_HIST` terminated with an error exit status (1)

Command executed:

  calc_frag_hist.py \
      --frag_path "*len.txt" \
      --output frag_len_hist.txt
  
  if [ -f "frag_len_hist.txt" ]; then
      cat frag_len_header.txt frag_len_hist.txt > frag_len_mqc.yml
  fi
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CUTANDRUN:CUTANDRUN:FRAG_LEN_HIST":
      python: $(python --version | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}")
      numpy: $(python -c 'import numpy; print(numpy.__version__)')
      pandas: $(python -c 'import pandas; print(pandas.__version__)')
      seaborn: $(python -c 'import seaborn; print(seaborn.__version__)')
  END_VERSIONS

Command exit status:
  1

Command output:
  Calclate fragment histogram

Command error:
  Calclate fragment histogram
  Traceback (most recent call last):
    File "/home/sm718/.nextflow/assets/nf-core/cutandrun/bin/calc_frag_hist.py", line 117, in <module>
      frag_hist["group"] = group_short
  TypeError: 'NoneType' object does not support item assignment

smoe avatar Oct 25 '23 13:10 smoe

No one else is reporting these errors, this is strange!

chris-cheshire avatar Oct 27 '23 07:10 chris-cheshire

I just send batch job to our cluster with v3.2.1, let's see what happens here.

skilpinen avatar Oct 27 '23 07:10 skilpinen

Cannot find revision 3.2.1 -- Make sure that it exists in the remote repository https://github.com/nf-core/cutandrun

Can I call this with explicit -r 3.2.1? Or is it still just 3.2?

skilpinen avatar Oct 27 '23 07:10 skilpinen

3.2.1 is out. nextflow pull cutandrun just performed the update for me. Still running :-/

My initial reflex was that this could be a Python-version issue wrt variable scoping when these are implicitly defined in the loop - but I tested this and this is not the case. That said, Python is 3.7 for me.

From how I interpret this, this only makes sense if the loop is never entered, i.e. if list(range(len(dt_frag_list))) is empty.

Does that make any sense?

smoe avatar Oct 27 '23 13:10 smoe

I have rerun 3.2.1 and it first failed with the same error, then added

--- a/bin/calc_frag_hist.py
+++ b/bin/calc_frag_hist.py
@@ -25,6 +25,7 @@
 # Author: @chris-cheshire
 
 import os
+import sys
 import glob
 import argparse
 
@@ -61,6 +62,8 @@ frag_hist = None
 dt_frag_list = glob.glob(frag_path)
 dt_frag_list.sort()
 
+print("I: Iterating over "+str(len(dt_frag_list))+" files with fragments from derived from glob search '"+frag_path+"' - this number should not be 0, the argument was '"+args.frag_path+"', all executed in '"+os.getcwd()+"'")
+
 for i in list(range(len(dt_frag_list))):
     # Create dataframe from csv file for each file and save to a list of data frames
     dt_frag_i = pd.read_csv(dt_frag_list[i], sep="\t", header=None, names=["Size", "Occurrences"])

and "-resume"-ed , corrected Python errors in my debug line :-) and resumed again. And then it worked. I found that the peak lists were not all cached when resumed - some files had still to be run. Would that be a potential explanation for the error? For a clearer error message, would you consider adding an extra debug message or even an early exit when the semantics are disturbed, like a glob not returning any results? Would you prefer me preparing a PR for that?

smoe avatar Oct 30 '23 23:10 smoe