azurehpc icon indicating copy to clipboard operation
azurehpc copied to clipboard

Unable to create a cluster out of an HPC Image derived from a VHD - package epel-release is not installed epel-release-7-11.noarch

Open souvik-de opened this issue 4 years ago • 3 comments

Describe the bug We have a pipeline that allows us to test a CentOS VHD. The pipeline downloads it into a storage account and then creates an image out of it. This image is now feed into the azhpc scripts to deploy a cluster and benchmarks are run. Before December 2020 we never had a issue doing it. But now the azhpc-build fails at the install_node_setup.sh step with the message "package epel-release is not installed epel-release-7-11.noarch".

To Reproduce Steps to reproduce the behavior:

  1. Have a CentOS-HPC VHD at your disposal.
  2. Download it on to a storage account and create an image out of it.
  3. Utilize the azhpc scripts and the image to deploy a cluster.
  4. You should encounter the error here.

Expected behavior As before Dec 2020, the azhpc-build should be able to deploy a cluster out of the image.

Screenshots image

Configuration (please complete the following information):

  • OS and version: CentOS 7.6 HPC (Test VHD)
  • Context of execution : Ubuntu from WSL2

souvik-de avatar Jan 27 '21 19:01 souvik-de

@edwardsp can you please have a look ?

xpillons avatar Jan 28 '21 09:01 xpillons

@souvik-de this is just failing as you are unable to ssh from the jumpbox to the compute instance. Have you tried to access the VMSS instance yourself (as you are able to connect to the jumpbox)? Also, does this happen consistently or just occasionally?

edwardsp avatar Jan 28 '21 14:01 edwardsp

I cannot ssh into the headnode even after resetting with password - "Permission denied (publickey,gssapi-keyex,gssapi-with-mic) | Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)". Happens consistently.

souvik-de avatar Jan 28 '21 20:01 souvik-de