error with salmon subsampling
Description of the bug
Hello, Trying to run rnaseq but getting an error. This is the output of the error:
-[nf-core/rnaseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE (cs16)'
Caused by:
Process `NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE (cs16)` terminated with an error exit status (1)
Command executed:
fq subsample \
--record-count 1000000 --seed 1 \
NG-32545_cs16_lib672230_combined_1.fastq.gz NG-32545_cs16_lib672230_combined_2.fastq.gz \
--r1-dst cs16.subsampled_R1.fastq.gz \
--r2-dst cs16.subsampled_R2.fastq.gz
cat <<-END_VERSIONS > versions.yml
"NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE":
fq: $(echo $(fq subsample --version | sed 's/fq-subsample //g'))
END_VERSIONS
Command exit status:
1
Command output:
2023-06-12T14:24:45.652863Z INFO fq::commands::subsample: fq-subsample start
2023-06-12T14:24:45.652884Z INFO fq::commands::subsample: initializing rng from seed = 1
2023-06-12T14:24:45.652888Z INFO fq::commands::subsample: counting records
Command error:
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
2023-06-12T14:24:45.652863Z INFO fq::commands::subsample: fq-subsample start
2023-06-12T14:24:45.652884Z INFO fq::commands::subsample: initializing rng from seed = 1
2023-06-12T14:24:45.652888Z INFO fq::commands::subsample: counting records
Error: invalid gzip header
Command used and terminal output
/data/home/hhz036/bin/nextflow run nf-core/rnaseq -r 3.12.0 --input /data/home/hhz036/scratch/mo_rna_analysis/input.csv --outdir /data/home/hhz036/scratch/mo_rna_analysis --aligner star_rsem --extra_star_align_args "--alignIntronMax 1000000 --alignIntronMin 20 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --outFilterMismatchNmax 999 --outFilterMultimapNmax 20 --outFilterType BySJout --outFilterMismatchNoverLmax 0.1 --clip3pAdapterSeq AAAAAAAA" --star_index /data/home/hhz036/scratch/STAR_index --gencode -profile singularity --fasta /data/home/hhz036/scratch/STAR_index/GRCh38.primary_assembly.genome.fa.gz --gtf /data/home/hhz036/scratch/STAR_index/gencode.v43.annotation.gtf.gzLaunching `https://github.com/nf-core/rnaseq` [jolly_almeida] DSL2 - revision: 3bec2331ca [3.12.0]
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/rnaseq v3.12.0-g3bec233
------------------------------------------------------
Core Nextflow options
revision : 3.12.0
runName : jolly_almeida
containerEngine : singularity
launchDir : /data/scratch/hhz036/mo_rna_analysis
workDir : /data/scratch/hhz036/mo_rna_analysis/work
projectDir : /data/home/hhz036/.nextflow/assets/nf-core/rnaseq
userName : hhz036
profile : singularity
configFiles : /data/home/hhz036/.nextflow/assets/nf-core/rnaseq/nextflow.config
Input/output options
input : /data/home/hhz036/scratch/mo_rna_analysis/input.csv
outdir : /data/home/hhz036/scratch/mo_rna_analysis
Reference genome options
fasta : /data/home/hhz036/scratch/STAR_index/GRCh38.primary_assembly.genome.fa.gz
gtf : /data/home/hhz036/scratch/STAR_index/gencode.v43.annotation.gtf.gz
star_index : /data/home/hhz036/scratch/STAR_index
gencode : true
Alignment options
aligner : star_rsem
extra_star_align_args: --alignIntronMax 1000000 --alignIntronMin 20 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --outFilterMismatchNmax 999 --outFilterMultimapNmax 20 --outFilterType BySJout --outFilterMismatchNoverLmax 0.1 --clip3pAdapterSeq AAAAAAAA
### Relevant files
[nextflow.log.zip](https://github.com/nf-core/rnaseq/files/11724651/nextflow.log.zip)
### System information
Hardware: HPC
Container Singularity/Apptainer
OS: CentOS
Hi @jalilsharif ! Apologies for the delay in responding. These kinds of errors with gzip are a little tricky to troubleshoot. My first suspicion is that the input FastQ file(s) is corrupted.
If you stil have the data available, can you try to run the following command on both input FastQ files please:
zcat <FASTQ_FILE> | head
Please feel free to join the nf-core Slack Workspace for any future questions/issues. We have an #rnaseq channel where you can get more real-time help.
You can also check that your files are actually compressed- I've occasionally seen people rename uncompressed files with a .gz extension and cause these sorts of problems.
In any case, no response from the OP in several months- closing.