ray-on-aml
ray-on-aml copied to clipboard
Unable to initialize cluster
Hi,
Thank you for this library. We are trying to get this working from the example code in an interactive environment in Azure ML. The Jupyter notebook is a Python 3.8 Azure ML notebook.
from azureml.core import Workspace, Run, Environment
from ray_on_aml.core import Ray_On_AML
ws = Workspace.from_config()
ray_on_aml =Ray_On_AML(ws=ws, compute_cluster ='ray-test', additional_pip_packages=['lightgbm_ray', 'sklearn'], maxnode=4)
ray = ray_on_aml.getRay(ci_is_head=False)
The image builds correctly on Azure ML. However, we receive the following error in the notebook.
Cancel active AML runs if any
Shutting down ray if any
Found existing cluster ray-test
Waiting cluster to start and return head node ip
..............................................................................................Cluster startup failed, check detail at run
And the following error inside the experiment:
Traceback (most recent call last):
File "source_file.py", line 103, in <module>
startRayMaster()
File "source_file.py", line 31, in startRayMaster
ip = socket.gethostbyname(socket.gethostname())
socket.gaierror: [Errno -3] Temporary failure in name resolution
This error comes with both True and False for ci_is_head.
All machines are inside the same VNET.
Let me know in case anything wrong with our setup or this is an issue with the library.
Thanks a lot!
@james-tn any support would be helpful. Thanks!
Hi, the library moved to https://github.com/microsoft/ray-on-aml So in your code in compute instance, can you do pip install --upgrade ray-on-aml then restart the kernel? Follow the example here: https://github.com/microsoft/ray-on-aml/blob/master/examples/quick_start_examples.ipynb