clipseq icon indicating copy to clipboard operation
clipseq copied to clipboard

iGenomes GRCh37 misread (?) during pureclip peakcalling

Open irikas opened this issue 2 months ago • 0 comments

Check Documentation

I have checked the following places for this error:

Description of the bug

pureclip_peak_call fails due to issue with genome.fa file

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: nextflow run nf-core/clipseq --input $MAIN_DIR/input/251106_eclip_design_50.csv --outdir $MAIN_DIR/output/20251113_042906/ERR039850 --genome GRCh37 --move_umi NNNXXXXNN --peakcaller pureclip --motif true --max_cpus 28 --max_time 72.h --email $EMAIL_IS -profile singularity
  2. See error:
nf-core/clipseq execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'pureclip_peak_call (ERR039850_TDP-43_SH-SY5Y_Cytoplasmic)'

Caused by:
  Process `pureclip_peak_call (ERR039850_TDP-43_SH-SY5Y_Cytoplasmic)` terminated with an error exit status (1)

Command executed:

  pureclip \
      -i ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.dedup.bam \
      -bai ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.dedup.bam.bai \
      -g genome.fa \
      -nt 12 \
       -bc 0 -dm 8 -iv 'all'  \
      -o "ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.sigxl.bed" \
      -or "ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.8nt.peaks.bed"
  
  pigz ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.sigxl.bed ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.8nt.peaks.bed

Command exit status:
  1

Command output:
  Protein-RNA crosslink site detection 
  ===============
  
  Created look-up table for values from -2000 to 0 with step size 0.00333333 (size: 600000).
  Loading reference ... 

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  Protein-RNA crosslink site detection 
  ===============
  
  Created look-up table for values from -2000 to 0 with step size 0.00333333 (size: 600000).
  Loading reference ... 
  ERROR: Can't load reference sequence from file 'genome.fa': Unexpected character 'M' found. 
  INFO:    Cleaning up image...

Work dir:
  /data/jling2/irika/clipdata/251028_Tollervey_TDPCLIP_E-MTAB-530/input/work/2f/f2c11688915ce254359e7993c0bdb4

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Expected behaviour

Peakcalling using pureclip

Log files

Have you provided the following extra information/files:

  • [x] The command used to run the pipeline
  • [ ] The .nextflow.log file

System

  • Hardware: HPC
  • Executor: slurm
  • OS: GNU/Linux
  • Version 4.18.0-477.21.1.el8_8.x86_64

Nextflow Installation

  • Version: 22.10.6

Container engine

  • Engine: Singularity
  • Version: 3.8.7
  • Image tag: nf-core/clipseq v1.0.0

Additional context

Not sure if this is specifically a problem with the genome file. If so - should I crosspost to https://github.com/ewels/AWS-iGenomes/issues?

irikas avatar Nov 13 '25 20:11 irikas