pcluster-manager icon indicating copy to clipboard operation
pcluster-manager copied to clipboard

parallelCluster Manager 3.2.0 failed on PERSISTENT_2 Lustre creation

Open cyberchip-wang opened this issue 2 years ago • 1 comments

parallelCluster Manager 3.2.0 created with https://www.hpcworkshops.com/03-deploy-pcm/01-deploy-pcm.html does not propagate PerUnitStorageThroughput to the final Cluster Configuration. This caused the "Dry Run" failed. The workaround is adding a line manually to the Cluster Configuration template file: PerUnitStorageThroughput: 125

Screenshots: Storage Properties Dry run error

URL for AWS ParallelCluster Manager: https://k0fymmi0ei.execute-api.us-east-2.amazonaws.com/home

The Lustre section in the Cluster Configuration: SharedStorage:

Name: FsxLustre0 StorageType: FsxLustre MountDir: /shared FsxLustreSettings: StorageCapacity: 1200 DeploymentType: PERSISTENT_2 DataCompressionType: LZ4 Dry Run error:

Invalid cluster configuration. ValidationErrors: FsxPersistentOptionsValidator: Per unit storage throughput must be specified when deployment type is PERSISTENT_2.

The issue can be reproduced by following the instructions from hpcworkshops:

Deploy the Pcluster Manager stack: https://www.hpcworkshops.com/03-deploy-pcm/01-deploy-pcm.html Create HPC Cluster: https://www.hpcworkshops.com/06-fsx-for-lustre/01-create-cluster.html Create FSx Lustre: https://www.hpcworkshops.com/06-fsx-for-lustre/02-create-cluster-fsx.html

cyberchip-wang avatar Aug 23 '22 18:08 cyberchip-wang

Thanks for pointing out this issue, we'll fix it asap. Until then you can add a line to the very bottom indented to the same level as DataCompressionType:

PerUnitStorageThroughput = 125

sean-smith avatar Aug 23 '22 19:08 sean-smith