aws-eda-slurm-cluster issues

[FEATURE] Enable exclusive scheduling by default

**Is your feature request related to a problem? Please describe.** Currently, users specify core and memory requirements for jobs so that Slurm can pick best compute node instance type for...

cartalla

[FEATURE] Document how to update compute node AMIs

**Is your feature request related to a problem? Please describe.** When the compute node AMI needs to be updated, what affect does that have on running jobs? Can it be...

cartalla

[FEATURE] Create AZ specific queues

**Is your feature request related to a problem? Please describe.** When multiple AZs are configured make sure that AZ-specific queues are created.

cartalla

[FEATURE] Notify SNS when script errors occur

**Is your feature request related to a problem? Please describe.** When errors occur in head node or compute node custom action scripts the configured SNS notification should be notified like...

cartalla

[FEATURE] Add additional features to compute nodes

**Is your feature request related to a problem? Please describe.** This is an example of a node definition from ParallelCluster: ``` NodeName=od-16-gb-dy-od-16gb-1-cores-[1-1000] CPUs=1 RealMemory=15564 State=CLOUD Feature=dynamic,od-16gb-1-cores Weight=1363 NodeName=od-128-gb-dy-od-128gb-2-cores-[1-1000] CPUs=2 RealMemory=124518...

cartalla

[FEATURE] Support more than 50 compute resources and queues

**Is your feature request related to a problem? Please describe.** Currently ParallelCluster only supports 50 compute resources and 50 queues. With memory based scheduling enabled you can only have 1...

cartalla

[FEATURE] Use RDS serverless with auto-scaling for Slurm database

**Is your feature request related to a problem? Please describe.** The ParallelCluster database stack currently uses static nodes instead of RDS serverless. Unclear if this will scale with cluster usage...

cartalla

[FEATURE] Add multi-region support for ParallelCluster

1

**Is your feature request related to a problem? Please describe.** The legacy version supported compute nodes in multiple AZs and regions. I don't think that orchestrating compute nodes in multiple...

cartalla

[FEATURE] Support HA configuration

**Is your feature request related to a problem? Please describe.** Slurm support multiple controllers for HA. Add support for multiple controllers with each in separate AZs. **Describe the solution you'd...

cartalla

[FEATURE] Reduce time to start new ParallelCluster compute nodes

1

**Is your feature request related to a problem? Please describe.** Newly started ParallelCluster compute nodes take at least 4-5 minutes to boot and start. This should be reduced to little...

cartalla

aws-eda-slurm-cluster
aws-eda-slurm-cluster copied to clipboard

Metadata

[FEATURE] Enable exclusive scheduling by default

[FEATURE] Document how to update compute node AMIs

[FEATURE] Create AZ specific queues

[FEATURE] Notify SNS when script errors occur

[FEATURE] Add additional features to compute nodes

[FEATURE] Support more than 50 compute resources and queues

[FEATURE] Use RDS serverless with auto-scaling for Slurm database

[FEATURE] Add multi-region support for ParallelCluster

[FEATURE] Support HA configuration

[FEATURE] Reduce time to start new ParallelCluster compute nodes

← Metadata

Owner

Metadata

aws-eda-slurm-cluster aws-eda-slurm-cluster copied to clipboard

Metadata

← Metadata

Owner

Metadata

aws-eda-slurm-cluster
aws-eda-slurm-cluster copied to clipboard