cutandrun
cutandrun copied to clipboard
Version 3.2 does not run, fails in frag histogram step
Description of the bug
Same arguments as with version 3.1 and that works, but the 3.2 fails with following error:
frag_hist["group"] = group_short NameError: name 'group_short' is not defined
Command used and terminal output
nextflow run nf-core/cutandrun \
-r 3.2 \
-profile singularity \
--input /scratch/project_2001539/Sami/nfcore_cutandrun/sample_sheets/sample_sheet.040723.csv \
--outdir /scratch/project_2001539/Sami/nfcore_cutandrun/outdir/out_210923 \
--genome mm10 \
--blacklist ~/metadata/mm10-blacklist.v2.bed \
--peakcaller seacr, MACS2 \
--normalisation_mode CPM \
--bowtie2 /scratch/project_2004451/refgenomes/alias/mm10/bowtie2_index/default/ \
--minimum_alignment_q_score 30 \
--replicate_threshold 2 \
Error:
frag_hist["group"] = group_short
NameError: name 'group_short' is not defined
Relevant files
No response
System information
N E X T F L O W ~ version 23.04.3, SLurm, HPC
Just echo-ing this -- I got the exact same error running the nf-core/cutandrun v3.2-g506a325 version of the pipeline on Nextflow 23.04.3 in Slurm/HPC setting.
Hi, I am just about to release a patch for 3.2.1, could you try when released?
Sure, I actually just had some C&R data come off the sequencer on Friday so I planned to run the pipeline this week anyways. I'll follow up in the next few days.
I came up with
--- a/bin/calc_frag_hist.py
+++ b/bin/calc_frag_hist.py
@@ -57,6 +57,9 @@ print("Calclate fragment histogram")
frag_path = os.path.abspath(args.frag_path)
frag_hist = None
+group_short = None
+rep_short = None
+
# Create list of deeptools raw fragment files
dt_frag_list = glob.glob(frag_path)
dt_frag_list.sort()
but happily receive other instructions to test.
After addressing variable visibility, the error changes to
ERROR ~ Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:FRAG_LEN_HIST'
Caused by:
Process `NFCORE_CUTANDRUN:CUTANDRUN:FRAG_LEN_HIST` terminated with an error exit status (1)
Command executed:
calc_frag_hist.py \
--frag_path "*len.txt" \
--output frag_len_hist.txt
if [ -f "frag_len_hist.txt" ]; then
cat frag_len_header.txt frag_len_hist.txt > frag_len_mqc.yml
fi
cat <<-END_VERSIONS > versions.yml
"NFCORE_CUTANDRUN:CUTANDRUN:FRAG_LEN_HIST":
python: $(python --version | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}")
numpy: $(python -c 'import numpy; print(numpy.__version__)')
pandas: $(python -c 'import pandas; print(pandas.__version__)')
seaborn: $(python -c 'import seaborn; print(seaborn.__version__)')
END_VERSIONS
Command exit status:
1
Command output:
Calclate fragment histogram
Command error:
Calclate fragment histogram
Traceback (most recent call last):
File "/home/sm718/.nextflow/assets/nf-core/cutandrun/bin/calc_frag_hist.py", line 117, in <module>
frag_hist["group"] = group_short
TypeError: 'NoneType' object does not support item assignment
No one else is reporting these errors, this is strange!
I just send batch job to our cluster with v3.2.1, let's see what happens here.
Cannot find revision 3.2.1
-- Make sure that it exists in the remote repository https://github.com/nf-core/cutandrun
Can I call this with explicit -r 3.2.1? Or is it still just 3.2?
3.2.1 is out. nextflow pull cutandrun
just performed the update for me. Still running :-/
My initial reflex was that this could be a Python-version issue wrt variable scoping when these are implicitly defined in the loop - but I tested this and this is not the case. That said, Python is 3.7 for me.
From how I interpret this, this only makes sense if the loop is never entered, i.e. if list(range(len(dt_frag_list))) is empty.
Does that make any sense?
I have rerun 3.2.1 and it first failed with the same error, then added
--- a/bin/calc_frag_hist.py
+++ b/bin/calc_frag_hist.py
@@ -25,6 +25,7 @@
# Author: @chris-cheshire
import os
+import sys
import glob
import argparse
@@ -61,6 +62,8 @@ frag_hist = None
dt_frag_list = glob.glob(frag_path)
dt_frag_list.sort()
+print("I: Iterating over "+str(len(dt_frag_list))+" files with fragments from derived from glob search '"+frag_path+"' - this number should not be 0, the argument was '"+args.frag_path+"', all executed in '"+os.getcwd()+"'")
+
for i in list(range(len(dt_frag_list))):
# Create dataframe from csv file for each file and save to a list of data frames
dt_frag_i = pd.read_csv(dt_frag_list[i], sep="\t", header=None, names=["Size", "Occurrences"])
and "-resume"-ed , corrected Python errors in my debug line :-) and resumed again. And then it worked. I found that the peak lists were not all cached when resumed - some files had still to be run. Would that be a potential explanation for the error? For a clearer error message, would you consider adding an extra debug message or even an early exit when the semantics are disturbed, like a glob not returning any results? Would you prefer me preparing a PR for that?