InnerEye-DeepLearning
InnerEye-DeepLearning copied to clipboard
`NODE_RANK` KeyError on training runs
Is there an existing issue for this?
- [X] I have searched the existing issues
Bug summary
When training models in InnerEye the error below is encountered due to recent changes in AML. This has been fixed in the latest hi-ml
version (v0.2.5), so IE-DL needs to be updated to this.
Code for reproduction
python InnerEye/ML/runner.py --model=Lung--azureml
Actual outcome
Training run fails
Error messages
File "InnerEye/ML/runner.py", line 466, in <module>
main()
File "InnerEye/ML/runner.py", line 460, in main
run(project_root=fixed_paths.repository_root_directory(),
File "InnerEye/ML/runner.py", line 456, in run
return runner.run()
File "InnerEye/ML/runner.py", line 220, in run
self.run_in_situ(azure_run_info)
File "InnerEye/ML/runner.py", line 408, in run_in_situ
set_environment_variables_for_multi_node()
File "/mnt/azureml/cr/j/bc3f99f19bb745519fd9272cfd730249/exe/wd/InnerEye/Azure/azure_runner.py", line 313, in set_environment_variables_for_multi_node
env_vars = ", ".join(f"{var} = {os.environ[var]}" for var in [ENV_MASTER_ADDR, ENV_MASTER_PORT, ENV_NODE_RANK])
File "/mnt/azureml/cr/j/bc3f99f19bb745519fd9272cfd730249/exe/wd/InnerEye/Azure/azure_runner.py", line 313, in <genexpr>
env_vars = ", ".join(f"{var} = {os.environ[var]}" for var in [ENV_MASTER_ADDR, ENV_MASTER_PORT, ENV_NODE_RANK])
File "/azureml-envs/azureml_e12c14b51edf42f47eec39c741162949/lib/python3.8/os.py", line 675, in __getitem__
raise KeyError(key) from None
KeyError: 'NODE_RANK'
Expected outcome
Successful training run
System info
No response