amazon-genomics-cli
amazon-genomics-cli copied to clipboard
Workflow running out of memory
Describe the Bug
Worker processes not spawning with enough memory or scaling; therefore Nexflow will error with exit status 137 (not enough memory)
Steps to Reproduce
name: foo
schemaVersion: 1
workflows:
foo:
type:
language: nextflow
version: dsl2
sourceURL: workflows/foo
contexts:
dev:
instanceTypes:
- "r5.large"
engines:
- type: nextflow
engine: nextflow
Child processes are spawing with 1vCPU and 1024 MEMORY
Relevant Logs
Main Process
2022-11-17T14:00:01.866-08:00 Version: 22.04.3 build 5703
2022-11-17T14:00:01.866-08:00 Created: 18-05-2022 19:22 UTC
2022-11-17T14:00:01.866-08:00 System: Linux 4.14.294-220.533.amzn2.x86_64
2022-11-17T14:00:01.866-08:00 Runtime: Groovy 3.0.10 on OpenJDK 64-Bit Server VM 11.0.16.1+9-LTS
2022-11-17T14:00:01.866-08:00 Encoding: UTF-8 (ANSI_X3.4-1968)
2022-11-17T14:00:01.866-08:00 Process: [email protected] [redacted]
2022-11-17T14:00:01.866-08:00 CPUs: 2 - Mem: 2 GB (1.5 GB) - Swap: 2 GB (2 GB)
2022-11-17T14:00:01.866-08:00 Nov-17 21:53:57.780 [main] WARN com.amazonaws.util.Base64 - JAXB is unavailable. Will fallback to SDK implementation which may be less performant.If you are using Java 9+, you will need to include javax.xml.bind:jaxb-api as a dependency.
2022-11-17T14:00:01.866-08:00 Nov-17 21:53:57.799 [main] DEBUG nextflow.file.FileHelper - Can't check if specified path is NFS (1): redacted
2022-11-17T14:00:01.866-08:00 Nov-17 21:53:57.799 [main] DEBUG nextflow.Session - Work-dir: redacted
2022-11-17T14:00:01.866-08:00 Nov-17 21:53:57.799 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /root/.nextflow/assets/redacted/bin
2022-11-17T14:00:01.866-08:00 Nov-17 21:53:57.871 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[AwsBatchExecutor]
2022-11-17T14:00:01.866-08:00 Nov-17 21:53:57.886 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
2022-11-17T14:00:01.866-08:00 Nov-17 21:53:57.954 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
2022-11-17T14:00:01.866-08:00 Nov-17 21:53:57.975 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 3; maxThreads: 1000
2022-11-17T14:00:01.866-08:00 Nov-17 21:53:58.123 [main] DEBUG nextflow.Session - Session start invoked
2022-11-17T14:00:01.866-08:00 Nov-17 21:53:59.049 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Child Process
2022-11-17T14:00:01.867-08:00 Essential container in task exited - OutOfMemoryError: Container killed due to memory usage
2022-11-17T14:00:01.867-08:00 Command executed:
2022-11-17T14:00:01.867-08:00 fastp -i USDA_soil_C35-5-1_1.fastq.gz -I USDA_soil_C35-5-1_2.fastq.gz -o "USDA_soil_C35-5-1.trim.R1.fq.gz" -O "USDA_soil_C35-5-1.trim.R2.fq.gz" --length_required 50 -h "USDA_soil_C35-5-1.html" -w 16
2022-11-17T14:00:01.867-08:00 Command exit status:
2022-11-17T14:00:01.867-08:00 137
2022-11-17T14:00:01.867-08:00 Command output:
2022-11-17T14:00:01.867-08:00 (empty)
2022-11-17T14:00:01.867-08:00 Command error:
2022-11-17T14:00:01.867-08:00 .command.sh: line 2: 188 Killed fastp -i USDA_soil_C35-5-1_1.fastq.gz -I USDA_soil_C35-5-1_2.fastq.gz -o "USDA_soil_C35-5-1.trim.R1.fq.gz" -O "USDA_soil_C35-5-1.trim.R2.fq.gz" --length_required 50 -h "USDA_soil_C35-5-1.html" -w 16
Expected Behavior
spawn processes with enough memory or scale.
Actual Behavior
Container ran out of memory
Screenshots
Additional Context
ran workflow with the following command: agc workflow run foo --context dev
Operating System: Linux AGC Version: 1.5.1 Was AGC setup with a custom bucket: no Was AGC setup with a custom VPC: no
I am seeing a similar behavior with cromwell. I give a task 64GB. In AWS batch, I see the following warning next to the Memory information
Configuration conflict
This value was submitted using containerOverrides.memory which has been deprecated and was not used as an override. Instead, the MEMORY value found in the job definition’s resourceRequirements key was used instead. More information about the deprecated key can be found in the AWS Batch API documentation.
I see an "Essential container in task exited". However, when I click on the job definition. It appears to have 8GB allocated memory. Is there a different way to specify memory?
Thanks for reporting this issue. Is this an issue with the 1.5.2 release as well?
It is still an issue with v 1.5.2 (cromwell)
@spitfiredd The child processes are spawned with a default of 1vCPU and 1024 MEMORY. If tasks need more memory or CPU then you would typically make these requests as process directives for CPU and memory. (https://www.nextflow.io/docs/latest/process.html#cpus) and (https://www.nextflow.io/docs/latest/process.html#memory).
@biofilos AGC is currently using an older version of Cromwell. This older version uses the deprecated call to AWS Batch, hence the error. In our next release we will update the version of Cromwell used.
As a possible work around, you might consider deploying a miniwdl
context to run the WDL.