pcluster-manager
pcluster-manager copied to clipboard
Create Pcluster Manager Fails
Hi,
Using the following cluster template breaks the create interface after the Storage Tab with:

From the Chrome console I see:
framework-bb5c596eafb42b22.js:1 TypeError: Cannot read properties of undefined (reading 'length')
at index-e0a78d528f2085ff.js:1:170318
at Array.filter (<anonymous>)
at index-e0a78d528f2085ff.js:1:170289
at _a (index-e0a78d528f2085ff.js:1:170339)
at oo (framework-bb5c596eafb42b22.js:1:59416)
at Ku (framework-bb5c596eafb42b22.js:1:111716)
at Li (framework-bb5c596eafb42b22.js:1:98957)
at Ni (framework-bb5c596eafb42b22.js:1:98885)
at Pi (framework-bb5c596eafb42b22.js:1:98748)
at bi (framework-bb5c596eafb42b22.js:1:95714)
cu @ framework-bb5c596eafb42b22.js:1
main-1ae0bdeb4d020668.js:1 TypeError: Cannot read properties of undefined (reading 'length')
at index-e0a78d528f2085ff.js:1:170318
at Array.filter (<anonymous>)
at index-e0a78d528f2085ff.js:1:170289
at _a (index-e0a78d528f2085ff.js:1:170339)
at oo (framework-bb5c596eafb42b22.js:1:59416)
at Ku (framework-bb5c596eafb42b22.js:1:111716)
at Li (framework-bb5c596eafb42b22.js:1:98957)
at Ni (framework-bb5c596eafb42b22.js:1:98885)
at Pi (framework-bb5c596eafb42b22.js:1:98748)
at bi (framework-bb5c596eafb42b22.js:1:95714)
ee @ main-1ae0bdeb4d020668.js:1
main-1ae0bdeb4d020668.js:1 A client-side exception has occurred, see here for more info: https://nextjs.org/docs/messages/client-side-exception-occurred
Here's the template that broke it, with some params changed:
Region: us-east-2
Image:
Os: alinux2
HeadNode:
InstanceType: c6i.2xlarge
Networking:
SubnetId: subnet-123456789
Ssh:
KeyName: keypair
DisableSimultaneousMultithreading: true
LocalStorage:
RootVolume:
Size: 100
VolumeType: gp3
CustomActions:
OnNodeConfigured:
Script: >-
https://bucket.us-east-2.amazonaws.com/headnode_install.sh
Args:
- HEAD
Iam:
AdditionalIamPolicies:
- Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
Dcv:
Enabled: true
Scheduling:
Scheduler: slurm
SlurmQueues:
- Name: compute
CapacityType: ONDEMAND
ComputeResources:
- Name: compute-hpc6a
Efa:
Enabled: true
GdrSupport: true
InstanceType: hpc6a.48xlarge
MinCount: 0
MaxCount: 101
DisableSimultaneousMultithreading: true
Networking:
SubnetIds:
- subnet-123456789
PlacementGroup:
Enabled: true
CustomActions:
OnNodeConfigured:
Script: >-
https://bucket.s3.us-east-2.amazonaws.com/compute_install.sh
ComputeSettings:
LocalStorage:
RootVolume:
VolumeType: gp3
SharedStorage:
- MountDir: /opt/ncar
Name: ncar
StorageType: Ebs
EbsSettings:
Size: '35'
VolumeType: gp3
DeletionPolicy: Delete
- MountDir: /scratch
Name: scratch
StorageType: FsxLustre
FsxLustreSettings:
FileSystemId: fs-0e709f43fbde2c3a2
Thank you for reaching out. This a known issue that's related to the fact you are using a template created with 3.2.0 on PCM 3.3.0
There is no fix available at the moment.
As a workaround, you can edit the template by replacing Scheduling > SlurmQueues > ComputeResources > InstanceType
and using the new Instances
property introduced with PC 3.3.0
Is there a web page that lists issues like this? These issues are not “known issues” to the customer. Perhaps a web page of known bugs would help. As a customer, this caused me some churn. Particularly when the issue is the result of a lack of backwards config file compatibility for a relatively minor update (3.2->3.3), it would be helpful to let your customers know.
@Silver-Linda
Thank you for voicing your concerns, we are working hard on adding a comprehensive documentation for the product but unfortunately we still are not there.
We could have pointed this out in our changelog as a breaking change, and will adopt this strategy in the future, so that customers are informed on what is happening and how to work around it
Regarding this specific issue, we are working on a fix, since we realize the app is not supposed to crash due to lack of backwards compatibility (you can follow this issue to be notified when the fix lands).
However, at the moment PCM does not support creating a cluster starting from a template created with previous versions. We are actively working on figuring out the best approach to take from here on out
@mendaomn
Thanks for the transparent reply. Pcluster manager is an extremely important aspect of parallelcluster usability but a lack of backwards compatibility (a feature pcluster manager used to have) makes it very difficult to maintain clusters while updating to any new parallelcluster versions. Not only does this impact my hopes of version control of the yaml config file, it is hard to create a yaml config file from scratch, even with pcluster manager. The parallelcluster move to a complicated yaml config file necessitated some sort of tool that helped create that file. Backwards compatibility seems critical to me.
Re-opening as current release still has this issue.
Definitely need to update the template used for this workshop to include the workaround now that the pcluster manager template defaults to 3.3.0
Also need to update the CLI instructions to install pcluster 3.3.0 otherwise the CLI-created cluster (which currently will use 3.4.0 by default as the latest version) you can't go back and view the CLI-created cluster using the 3.3.0 version pcluster manager. You'll get an error "Cluster hpc belongs to an incompatible ParallelCluster major version"
@natalie-white-aws my PR updates the template to resolve this issue and we'll release 3.4.0 with ParallelCluster Manager shortly.