rnaseq icon indicating copy to clipboard operation
rnaseq copied to clipboard

error with salmon subsampling

Open jalilsharif opened this issue 2 years ago • 1 comments

Description of the bug

Hello, Trying to run rnaseq but getting an error. This is the output of the error:

-[nf-core/rnaseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE (cs16)'

Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE (cs16)` terminated with an error exit status (1)

Command executed:

  fq subsample \
      --record-count 1000000 --seed 1 \
      NG-32545_cs16_lib672230_combined_1.fastq.gz NG-32545_cs16_lib672230_combined_2.fastq.gz \
      --r1-dst cs16.subsampled_R1.fastq.gz \
      --r2-dst cs16.subsampled_R2.fastq.gz

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE":
      fq: $(echo $(fq subsample --version | sed 's/fq-subsample //g'))
  END_VERSIONS

Command exit status:
  1

Command output:
  2023-06-12T14:24:45.652863Z  INFO fq::commands::subsample: fq-subsample start
  2023-06-12T14:24:45.652884Z  INFO fq::commands::subsample: initializing rng from seed = 1
  2023-06-12T14:24:45.652888Z  INFO fq::commands::subsample: counting records

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  2023-06-12T14:24:45.652863Z  INFO fq::commands::subsample: fq-subsample start
  2023-06-12T14:24:45.652884Z  INFO fq::commands::subsample: initializing rng from seed = 1
  2023-06-12T14:24:45.652888Z  INFO fq::commands::subsample: counting records
  Error: invalid gzip header

Command used and terminal output

/data/home/hhz036/bin/nextflow run nf-core/rnaseq -r 3.12.0 --input /data/home/hhz036/scratch/mo_rna_analysis/input.csv  --outdir /data/home/hhz036/scratch/mo_rna_analysis --aligner star_rsem --extra_star_align_args "--alignIntronMax 1000000 --alignIntronMin 20 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --outFilterMismatchNmax 999 --outFilterMultimapNmax 20 --outFilterType BySJout --outFilterMismatchNoverLmax 0.1 --clip3pAdapterSeq AAAAAAAA" --star_index /data/home/hhz036/scratch/STAR_index --gencode -profile singularity --fasta /data/home/hhz036/scratch/STAR_index/GRCh38.primary_assembly.genome.fa.gz --gtf /data/home/hhz036/scratch/STAR_index/gencode.v43.annotation.gtf.gzLaunching `https://github.com/nf-core/rnaseq` [jolly_almeida] DSL2 - revision: 3bec2331ca [3.12.0]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rnaseq v3.12.0-g3bec233
------------------------------------------------------
Core Nextflow options
  revision             : 3.12.0
  runName              : jolly_almeida
  containerEngine      : singularity
  launchDir            : /data/scratch/hhz036/mo_rna_analysis
  workDir              : /data/scratch/hhz036/mo_rna_analysis/work
  projectDir           : /data/home/hhz036/.nextflow/assets/nf-core/rnaseq
  userName             : hhz036
  profile              : singularity
  configFiles          : /data/home/hhz036/.nextflow/assets/nf-core/rnaseq/nextflow.config

Input/output options
  input                : /data/home/hhz036/scratch/mo_rna_analysis/input.csv
  outdir               : /data/home/hhz036/scratch/mo_rna_analysis

Reference genome options
  fasta                : /data/home/hhz036/scratch/STAR_index/GRCh38.primary_assembly.genome.fa.gz
  gtf                  : /data/home/hhz036/scratch/STAR_index/gencode.v43.annotation.gtf.gz
  star_index           : /data/home/hhz036/scratch/STAR_index
  gencode              : true

Alignment options
  aligner              : star_rsem
  extra_star_align_args: --alignIntronMax 1000000 --alignIntronMin 20 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --outFilterMismatchNmax 999 --outFilterMultimapNmax 20 --outFilterType BySJout --outFilterMismatchNoverLmax 0.1 --clip3pAdapterSeq AAAAAAAA


### Relevant files

[nextflow.log.zip](https://github.com/nf-core/rnaseq/files/11724651/nextflow.log.zip)


### System information

Hardware: HPC
Container Singularity/Apptainer
OS: CentOS

jalilsharif avatar Jun 12 '23 14:06 jalilsharif

Hi @jalilsharif ! Apologies for the delay in responding. These kinds of errors with gzip are a little tricky to troubleshoot. My first suspicion is that the input FastQ file(s) is corrupted.

If you stil have the data available, can you try to run the following command on both input FastQ files please:

zcat <FASTQ_FILE> | head

Please feel free to join the nf-core Slack Workspace for any future questions/issues. We have an #rnaseq channel where you can get more real-time help.

drpatelh avatar Oct 15 '23 11:10 drpatelh

You can also check that your files are actually compressed- I've occasionally seen people rename uncompressed files with a .gz extension and cause these sorts of problems.

In any case, no response from the OP in several months- closing.

pinin4fjords avatar May 31 '24 08:05 pinin4fjords