aws-parallelcluster-post-install-scripts
aws-parallelcluster-post-install-scripts copied to clipboard
Scripts to customize AWS ParallelCluster
*Issue #, if available:* *Description of changes:* By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Can we add the following to the post-install scripts to install nvidia-container-cli? ``` curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \...
Pyxis runtime path cannot be /fsx, otherwise error to run Docker image (directly) on multiple nodes. ```console # NOTE: below works fine for -N1. $ srun -N2 --container-image=alpine grep PRETTY...
ECR is [now supported](https://github.com/NVIDIA/enroot/pull/159) lets add it to default template.
In a default Parallelcluster (3.9.1) configuration it can happen that there is no `slurmdbd.conf` file: ``` Recipe: @recipe_files::/tmp/slurm_rest_api/slurm_rest_api.rb * ruby_block[Create JWT key file] action run - execute the ruby block...
All nodes running the install script will change slurm global configuration that is shared across nodes.
The pyxis post install script is not installing Nvidia Container CLI in any case: https://github.com/aws-samples/aws-parallelcluster-post-install-scripts/blob/main/pyxis/postinstall.sh#L45-L47 Due to code line, ``` if [ $GPU_PRESENT -eq 0 ] && [ $GPU_CONTAINER_PRESENT -gt...