azhpc-images
azhpc-images copied to clipboard
Azure HPC/AI VM Images
I'm using the ubuntu-hpc 2204 x64 Gen 2 image on a Standard NC24ads A100 v4 VM. I train a vLLM model that uses NCCL and observe the following error: Error...
The GPU Driver is using CUDA 12.2 but the CUDA runtime installed (nvcc) is 12.4 `nvidia-smi` ``` +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ |...
Revert CUDA version to match driver version (12.2). This was missed in the original revert here: https://github.com/Azure/azhpc-images/pull/334 Closes: #343
The module directory is not named the same between CentOS and Ubuntu images, avoiding the same MPI scripts to be used across platforms. On CentOS : /usr/share/**Modules**/modulefiles/mpi/ On Ubuntu :...
The `mpiifort` command does not work in the almalinux 8.7 image as `ifort` was not configured with the oneapi setup. No intel compiler is provided by default with the present...
It would be super useful to have, for each azhpc VM image on the Azure marketplace, a matching container image, say, on the [Microsoft container registry](https://mcr.microsoft.com/). This would allow using...
Hello, would it be possible to support an RHELv9 clone also? After the RHEL changed access to sources AlmaLinux decided to drop the aim of a 1:1 RHEL clone. https://almalinux.org/blog/future-of-almalinux/...
The current Ubuntu images do not include the NCCL topology file for the Standard_NC80adis_H100_v5 SKU.
Nvidia has a known issue with the inability to mount CIFS and SMB shares when using the Mellanox OFED kernel. https://docs.nvidia.com/networking/display/mlnxofedv590560/known+issues issue: 2657392 Nvidia state this is not something they...
run-tests.sh has in the meantime a very complex structure and carrys many variables for the different distros and versions. As the scripts write a the component versions in a file,...