clipseq
clipseq copied to clipboard
iGenomes GRCh37 misread (?) during pureclip peakcalling
Check Documentation
I have checked the following places for this error:
Description of the bug
pureclip_peak_call fails due to issue with genome.fa file
Steps to reproduce
Steps to reproduce the behaviour:
- Command line:
nextflow run nf-core/clipseq --input $MAIN_DIR/input/251106_eclip_design_50.csv --outdir $MAIN_DIR/output/20251113_042906/ERR039850 --genome GRCh37 --move_umi NNNXXXXNN --peakcaller pureclip --motif true --max_cpus 28 --max_time 72.h --email $EMAIL_IS -profile singularity - See error:
nf-core/clipseq execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.
The full error message was:
Error executing process > 'pureclip_peak_call (ERR039850_TDP-43_SH-SY5Y_Cytoplasmic)'
Caused by:
Process `pureclip_peak_call (ERR039850_TDP-43_SH-SY5Y_Cytoplasmic)` terminated with an error exit status (1)
Command executed:
pureclip \
-i ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.dedup.bam \
-bai ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.dedup.bam.bai \
-g genome.fa \
-nt 12 \
-bc 0 -dm 8 -iv 'all' \
-o "ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.sigxl.bed" \
-or "ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.8nt.peaks.bed"
pigz ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.sigxl.bed ERR039850_TDP-43_SH-SY5Y_Cytoplasmic.8nt.peaks.bed
Command exit status:
1
Command output:
Protein-RNA crosslink site detection
===============
Created look-up table for values from -2000 to 0 with step size 0.00333333 (size: 600000).
Loading reference ...
Command error:
INFO: Converting SIF file to temporary sandbox...
Protein-RNA crosslink site detection
===============
Created look-up table for values from -2000 to 0 with step size 0.00333333 (size: 600000).
Loading reference ...
ERROR: Can't load reference sequence from file 'genome.fa': Unexpected character 'M' found.
INFO: Cleaning up image...
Work dir:
/data/jling2/irika/clipdata/251028_Tollervey_TDPCLIP_E-MTAB-530/input/work/2f/f2c11688915ce254359e7993c0bdb4
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Expected behaviour
Peakcalling using pureclip
Log files
Have you provided the following extra information/files:
- [x] The command used to run the pipeline
- [ ] The
.nextflow.logfile
System
- Hardware: HPC
- Executor: slurm
- OS: GNU/Linux
- Version 4.18.0-477.21.1.el8_8.x86_64
Nextflow Installation
- Version: 22.10.6
Container engine
- Engine: Singularity
- Version: 3.8.7
- Image tag: nf-core/clipseq v1.0.0
Additional context
Not sure if this is specifically a problem with the genome file. If so - should I crosspost to https://github.com/ewels/AWS-iGenomes/issues?