aws-eda-slurm-cluster icon indicating copy to clipboard operation
aws-eda-slurm-cluster copied to clipboard

AWS Slurm Cluster for EDA Workloads

Results 39 aws-eda-slurm-cluster issues
Sort by recently updated
recently updated
newest added

**Is your feature request related to a problem? Please describe.** Currently, users specify core and memory requirements for jobs so that Slurm can pick best compute node instance type for...

**Is your feature request related to a problem? Please describe.** When the compute node AMI needs to be updated, what affect does that have on running jobs? Can it be...

**Is your feature request related to a problem? Please describe.** When multiple AZs are configured make sure that AZ-specific queues are created.

**Is your feature request related to a problem? Please describe.** When errors occur in head node or compute node custom action scripts the configured SNS notification should be notified like...

**Is your feature request related to a problem? Please describe.** This is an example of a node definition from ParallelCluster: ``` NodeName=od-16-gb-dy-od-16gb-1-cores-[1-1000] CPUs=1 RealMemory=15564 State=CLOUD Feature=dynamic,od-16gb-1-cores Weight=1363 NodeName=od-128-gb-dy-od-128gb-2-cores-[1-1000] CPUs=2 RealMemory=124518...

**Is your feature request related to a problem? Please describe.** Currently ParallelCluster only supports 50 compute resources and 50 queues. With memory based scheduling enabled you can only have 1...

**Is your feature request related to a problem? Please describe.** The ParallelCluster database stack currently uses static nodes instead of RDS serverless. Unclear if this will scale with cluster usage...

**Is your feature request related to a problem? Please describe.** The legacy version supported compute nodes in multiple AZs and regions. I don't think that orchestrating compute nodes in multiple...

**Is your feature request related to a problem? Please describe.** Slurm support multiple controllers for HA. Add support for multiple controllers with each in separate AZs. **Describe the solution you'd...

**Is your feature request related to a problem? Please describe.** Newly started ParallelCluster compute nodes take at least 4-5 minutes to boot and start. This should be reduced to little...