hi-ml icon indicating copy to clipboard operation
hi-ml copied to clipboard

FIX: Update env var settings for multi node

Open mebristo opened this issue 2 years ago • 4 comments

Changes in the way that Azure ML set the environment variable ENV_AZ_BATCHAI_MPI_MASTER_NODE mean that the logic for setting environment variables for multi-node jobs needs updating

mebristo avatar Sep 02 '22 07:09 mebristo

Codecov Report

Merging #588 (1d163bf) into main (1efdced) will decrease coverage by 0.17%. The diff coverage is 7.40%.

Impacted file tree graph

Flag Coverage Δ
hi-ml 81.73% <0.00%> (-0.08%) :arrow_down:
hi-ml-azure 25.26% <8.00%> (-0.21%) :arrow_down:
hi-ml-cpath 77.39% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
hi-ml/src/health_ml/runner.py 80.29% <0.00%> (-1.33%) :arrow_down:
hi-ml-azure/src/health_azure/utils.py 25.49% <8.00%> (-0.39%) :arrow_down:

codecov[bot] avatar Sep 02 '22 07:09 codecov[bot]

LGTM, but some tests are failing.

Also, @Shruthi42 is probably a much more appropriate reviewer than I am for this PR.

fepegar avatar Sep 02 '22 08:09 fepegar

LGTM, but some tests are failing.

Also, @Shruthi42 is probably a much more appropriate reviewer than I am for this PR.

Good to know, thanks!

mebristo avatar Sep 02 '22 09:09 mebristo

Looking good! I added suggestions for corner cases. Also, you had another safety check in the runner, to only call the function when we are really doing multi-node? Should we add a multi-node training job to the hi-ml test suite in a follow-up PR (happy to work on that myself)?

Good idea to add the multi-node training job in a follow-up PR

mebristo avatar Sep 02 '22 12:09 mebristo