FIX: Update env var settings for multi node

Open mebristo opened this issue 2 years ago • 4 comments

Changes in the way that Azure ML set the environment variable ENV_AZ_BATCHAI_MPI_MASTER_NODE mean that the logic for setting environment variables for multi-node jobs needs updating

Sep 02 '22 07:09 mebristo

Codecov Report

Merging #588 (1d163bf) into main (1efdced) will decrease coverage by 0.17%. The diff coverage is 7.40%.

Flag	Coverage Δ
hi-ml	`81.73% <0.00%> (-0.08%)`	:arrow_down:
hi-ml-azure	`25.26% <8.00%> (-0.21%)`	:arrow_down:
hi-ml-cpath	`77.39% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
hi-ml/src/health_ml/runner.py	`80.29% <0.00%> (-1.33%)`	:arrow_down:
hi-ml-azure/src/health_azure/utils.py	`25.49% <8.00%> (-0.39%)`	:arrow_down:

Sep 02 '22 07:09 codecov[bot]

LGTM, but some tests are failing.

Also, @Shruthi42 is probably a much more appropriate reviewer than I am for this PR.

Sep 02 '22 08:09 fepegar

LGTM, but some tests are failing.

Also, @Shruthi42 is probably a much more appropriate reviewer than I am for this PR.

Good to know, thanks!

Sep 02 '22 09:09 mebristo

Looking good! I added suggestions for corner cases. Also, you had another safety check in the runner, to only call the function when we are really doing multi-node? Should we add a multi-node training job to the hi-ml test suite in a follow-up PR (happy to work on that myself)?

Good idea to add the multi-node training job in a follow-up PR

Sep 02 '22 12:09 mebristo

hi-ml hi-ml copied to clipboard

FIX: Update env var settings for multi node

Codecov Report

hi-ml
hi-ml copied to clipboard