rnaseq icon indicating copy to clipboard operation
rnaseq copied to clipboard

Running this pipeline on AWS Batch

Open lwtan90 opened this issue 2 years ago • 6 comments

Description of feature

This is my first time using Nextflow on RNA-seq analysis, and I find that this pipeline works flawlessly. But, I am trying to use this on AWSBatch, and there isn't a profile made for awsbatch. Can you kindly suggest a way to run it? modify the nextflow.config? Thank you for this great pipeline!

lwtan90 avatar Nov 10 '23 19:11 lwtan90

Hi, Iwtan90, I try to run the pipeline in AWS batch using the profile 'docker'. This will start the pipeline.

I say 'try to' because in my case the pipeline does not run flawlessly but fails at various early steps with varying error messages, e.g. ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (sample_name)' Caused by: Task failed to start - CannotCreateContainerError: Error response from daemon: devmapper: Thin Pool has 1714 free data blocks which is less than minimum required 4449 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior

or

sh: line 1: 24696 Segmentation fault pigz -p 8 -c - > another_sample_name_2_trimmed.fq.gz

I feel that I am not in control of the memory usage of (spot) instances in my AWS batch compute environment other than asking for a specific instance type, e.g. c6a.16xlarge which should have more than enough memory for the trim_galore step. (Instance type 'optimal' also gave the segmentation fault at least once.)

FrankMaiwald avatar Nov 20 '23 14:11 FrankMaiwald

nf-core comes with a default batch profile called awsbatch: https://github.com/nf-core/configs/blob/master/docs/awsbatch.md

adamrtalbot avatar Jan 08 '24 10:01 adamrtalbot

@FrankMaiwald you can adjust any and all resources per process: https://nf-co.re/docs/usage/configuration#tuning-workflow-resources

adamrtalbot avatar Jan 08 '24 10:01 adamrtalbot

I hit the same issue. I think the resource requirements on the pipeline are too low, especially for disk. My AWS Compute Environment is configured to use c7a and m7a machine families. On my runs, during the Trim Galore sub workflow, I get c7a.48xlarge machines but with only 30 GB disk, and my processes run out of disk space, which causes the above issues.

siddharthab avatar May 07 '24 16:05 siddharthab

It looks like AWS Batch does not take disk size requirements. The recommended way is to use a launch template in your compute environment. I will see if I can resolve this.

siddharthab avatar May 07 '24 18:05 siddharthab

I tried to increase the disk size in the launch template, but was maybe not doing something right. Instead, I went for the better solution which is scratch-less fusion.

For AWS Batch, you would still need to create a launch template with a user data section. This script wrapped in the MIME format pasted into the user data section of the launch template worked well for us. Of course, we had to restrict our compute environment to use only the *d instances, and configure nextflow to use fusion without scratch.

siddharthab avatar May 08 '24 03:05 siddharthab

If a Fusion is not your cup of tea, you might also choose to

  • increase the boot disk size of your EC2 instances, and/or
  • use EBS (and potentially increasing the EBS block size)

Let me know if you run into any difficulties there. Happy to help out.

robsyme avatar May 29 '24 16:05 robsyme

I will close this for now, as this looks like more of a generic infrastructure issue. Please feel free to join the #rnaseq channel in the nf-core Slack Workspace or the #infrastructure-aws channel in the Nextflow Slack Workspace for more real-time help.

drpatelh avatar Jun 19 '24 08:06 drpatelh