MONAI
MONAI copied to clipboard
Test error: Distributed call failed in min-dep-os
Describe the bug
/Users/runner/work/MONAI/MONAI/monai/transforms/io/array.py:213: UserWarning: required package for reader PILReader is not installed, or the version doesn't match requirement.
Traceback (most recent call last):
warnings.warn(
File "/Users/runner/work/MONAI/MONAI/tests/utils.py", line 541, in _wrapper
/Users/runner/work/MONAI/MONAI/monai/transforms/io/array.py:213: UserWarning: required package for reader ITKReader is not installed, or the version doesn't match requirement.
warnings.warn(
/Users/runner/work/MONAI/MONAI/monai/transforms/io/array.py:213: UserWarning: required package for reader NrrdReader is not installed, or the version doesn't match requirement.
warnings.warn(
/Users/runner/work/MONAI/MONAI/monai/transforms/io/array.py:213: UserWarning: required package for reader PydicomReader is not installed, or the version doesn't match requirement.
warnings.warn(
/Users/runner/work/MONAI/MONAI/monai/transforms/utils.py:561: UserWarning: Num foregrounds 27, Num backgrounds 0, unable to generate class balanced samples, setting `pos_ratio` to 1.
warnings.warn(
assert results.get(), "Distributed call failed."
AssertionError: Distributed call failed.
To Reproduce
https://github.com/Project-MONAI/MONAI/actions/runs/5455742504/jobs/9927617836?pr=6623
Expected behavior
The test should pass.
Add any other context about the problem here.
root cause seems to be the github ci runner
test_even (tests.test_sampler_dist.DistributedSamplerTest) ... ok
Process SpawnProcess-80:
Traceback (most recent call last):
File "/Users/runner/work/MONAI/MONAI/tests/utils.py", line 505, in run_process
raise e
File "/Users/runner/work/MONAI/MONAI/tests/utils.py", line 489, in run_process
dist.init_process_group(
File "/Users/runner/hostedtoolcache/Python/3.8.17/x64/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 907, in init_process_group
default_pg = _new_process_group_helper(
File "/Users/runner/hostedtoolcache/Python/3.8.17/x64/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1009, in _new_process_group_helper
backend_class = ProcessGroupGloo(backend_prefix_store, group_rank, group_size, timeout=timeout)
RuntimeError: [enforce fail at /Users/runner/work/pytorch/pytorch/pytorch/third_party/gloo/gloo/transport/uv/device.cc:153] rp != nullptr. Unable to find address for: Mac-1688480011779.local
Should we have any next steps?
Let's keep this open, currently in most cases manually rerunning the pipelines clears the error. if it's becoming frequent we can remove the multiprocess tests on macos.