amazon-genomics-cli icon indicating copy to clipboard operation
amazon-genomics-cli copied to clipboard

Add Elastic Throughput Mode

Open ElDeveloper opened this issue 9 months ago • 0 comments

Description

Chatting with an Amazon engineer via a support case, he recommended that for my workflow it would be most cost-efficient to use Elastic Throughput. For context, my workflow is CPU-bound and does very little IO (~ each Snakemake job runs for about 4 hours only reading one 20 MB file at the beginning of the job and writing one 30 MB at the end of the job).

Use Case

This would be most beneficial to a workflow that deals with a small amount of data. With the current version bursting would ideally work well but the problem is that since there's not as much need for storage, the amount of credits you get is small. And also using provisioned throughput is significantly more expensive.

Proposed Solution

Add the ability to select Elastic Throughput via a configuration parameter similar to what's done for provisioned throughput.

Other information

For additional context, I am using Snakemake and conda to manage dependencies. Per the documentation I set the "conda prefix" to a location in "/mnt/efs" (--conda-prefix /mnt/efs/snakemake/conda). The only operations my workers execute are (1) read one 20 MB file, (2) perform a large number of calculations in R, and (3) write a 30 MB file. However, after monitoring the performance (using lsof -p PID), I noticed that the process itself needs to load shared library files (for the R dependencies, etc) and since these are in the EFS mount, it seems to put just enough pressure to exhaust my bursting credits (for example running ~256 simultaneous processes).

ElDeveloper avatar Oct 02 '23 14:10 ElDeveloper