PiPPy
PiPPy copied to clipboard
pytests_test_gpu(0) will fail if allocated a non-4 gpu server - add guard/skip?
In running the pytests for a recent PR, I was allocated a 3 gpu server rather than 4 gpu. (presumably a bad gpu on a 4 gpu server, but unclear if this is a new allocation option).

This odd number gpu count causes the current block of pytests_gpu(0) to fail as the device mesh attempts to reshape into a [2,2] block, which isn't possible with 3 gpus.
i.e. error:

b/c of:
This issue is to track potentially adding an auto check for world size to skip the tests if allocated an unexpected config (i.e. 3) or else err out with an informative error rather than a series of failing tests.