rnaseq icon indicating copy to clipboard operation
rnaseq copied to clipboard

star_rsem aligner fails if started from NFS

Open schaffman5 opened this issue 3 years ago • 1 comments

Description of the bug

When nextflow is started from an NFS mount (AWS EFS volume), the star_rsem aligner fails at the rsem-calculate-expression step. It appears that this step launches STAR and requires a file system supporting FIFO (named pipe). Since NFS doesn't seem to support FIFO, the execution stops. The error indicates that the --outTmpDir STAR parameter may be used to point to a (non-NFS) Linux volume, however, rsem-calculate-expression doesn't expose this parameter for launching STAR.

I confirmed that the pipeline runs the same command successfully when started from an EBS (xfs file system) volume. pipeline runs the same command successfully when started from an EBS (xfs file system) volume.

Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_RSEM:RSEM_CALCULATEEXPRESSION (test_sample)'

Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_RSEM:RSEM_CALCULATEEXPRESSION (test_sample)` terminated with an error exit status (255)

Command executed:

  INDEX=`find -L ./ -name "*.grp" | sed 's/.grp//'`
  rsem-calculate-expression \
      --num-threads 12 \
      --temporary-folder ./tmp/ \
      --strandedness reverse \
      --paired-end \
      --star --star-output-genome-bam --star-gzipped-read-file --estimate-rspd --seed 1 \
      test_sample_1_val_1.fq.gz test_sample_2_val_2.fq.gz \
      $INDEX \
      test_sample
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:QUANTIFY_RSEM:RSEM_CALCULATEEXPRESSION":
      rsem: $(rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g")
      star: $(STAR --version | sed -e "s/STAR_//g")
  END_VERSIONS

Command exit status:
  255

Command output:
  STAR --genomeDir ./rsem  --outSAMunmapped Within  --outFilterType BySJout  --outSAMattributes NH HI AS NM MD  --outFilterMultimapNmax 20  --outFilterMismatchNmax 999  --outFilterMismatchNoverLmax 0.04  --alignIntronMin 20  --alignIntronMax 1000000  --alignMatesGapMax 1000000  --alignSJoverhangMin 8  --alignSJDBoverhangMin 1  --sjdbScore 1  --runThreadN 12  --genomeLoad NoSharedMemory  --outSAMtype BAM Unsorted  --quantMode TranscriptomeSAM  --outSAMheaderHD @HD VN:1.4 SO:unsorted  --outFileNamePrefix ./tmp//test_sample  --readFilesCommand zcat  --readFilesIn test_sample_1_val_1.fq.gz test_sample_2_val_2.fq.gz
  "STAR --genomeDir ./rsem  --outSAMunmapped Within  --outFilterType BySJout  --outSAMattributes NH HI AS NM MD  --outFilterMultimapNmax 20  --outFilterMismatchNmax 999  --outFilterMismatchNoverLmax 0.04  --alignIntronMin 20  --alignIntronMax 1000000  --alignMatesGapMax 1000000  --alignSJoverhangMin 8  --alignSJDBoverhangMin 1  --sjdbScore 1  --runThreadN 12  --genomeLoad NoSharedMemory  --outSAMtype BAM Unsorted  --quantMode TranscriptomeSAM  --outSAMheaderHD @HD VN:1.4 SO:unsorted  --outFileNamePrefix ./tmp//test_sample  --readFilesCommand zcat  --readFilesIn test_sample_1_val_1.fq.gz test_sample_2_val_2.fq.gz" failed! Plase check if you provide correct parameters/options for the pipeline!

Command error:
  
  Exiting because of *FATAL ERROR*: could not create FIFO file ./tmp//test_sample_STARtmp/tmp.fifo.read1
  SOLUTION: check the if run directory supports FIFO files.
  If run partition does not support FIFO (e.g. Windows partitions FAT, NTFS), re-run on a Linux partition, or point --outTmpDir to a Linux partition.
  
  Feb 28 08:59:49 ...... FATAL ERROR, exiting

Command used and terminal output

nextflow run nf-core/rnaseq --input samplesheet_test.csv --genome GRCh37 --pseudo_aligner salmon --igenomes_base /data/refdata/igenomes --aligner star_rsem --salmon_quant_libtype A --salmon_index /data/refdata/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/salmon_index --max_cpus 16 --max_memory 200GB -profile docker -r 3.5

Relevant files

No response

System information

Nextflow version 21.10.6 Hardware: AWS EC2 - 96 x Intel Xeon Platinum 8259CL CPU / 780GB RAM Executor: local Container: Docker OS: RHEL 7.9 (SE Linux enabled) nf-core/rnaseq v3.5 (revision: 646723c70f)

schaffman5 avatar Feb 28 '22 10:02 schaffman5

Hi @schaffman5 ! Thanks for reporting and apologies for the delay in responding.

Yep, this is a tricky one because as you mentioned, the STAR parameters used by RSEM are hard-coded and can't be changed via the pipeline.

I am planning on updating STAR to the latest version in the next release https://github.com/BioContainers/multi-package-containers/pull/2152 but this may not fix the issue.

Did you manage to find a workaround? Wonder if changing TMPDIR would help but this would rely on STAR using this path too...🤔

You should be able to achieve this via the Nextflow env scope as documented here

drpatelh avatar Apr 27 '22 10:04 drpatelh

Hi @schaffman5 ! Wondering whether you managed to fix this issue?

drpatelh avatar Sep 25 '22 09:09 drpatelh

Closing for now but feel free to re-open if the issue persists or if you are able to update us with how you fixed it. Thanks!

drpatelh avatar Sep 29 '22 09:09 drpatelh

Is there a recommended solution to this problem? I am facing the same issue.

ashwini-girish avatar Mar 24 '23 16:03 ashwini-girish

I was not able to find a direct solution other than to run it from a non-NFS volume.

schaffman5 avatar Mar 24 '23 19:03 schaffman5

I'm facing the same problem with a CIFS-mounted storage. Has anyone found a solution to this problem?

hector-romao avatar Oct 21 '23 12:10 hector-romao