chapel
chapel copied to clipboard
[Bug]: Behaviour of `here.gpus[x]` when `x` is out of bounds
Descritpion
While playing with the built-in array here.gpus
, I discovered that the behaviour of here.gpus[x]
when x
is out of bounds is somehow not clearly defined. For example, I executed this program on a system with 8 GPUs, but only one enabled (CHPL_RT_NUM_GPUS_PER_LOCALE=1
):
config const gpuID = 0;
proc main() {
writeln(here.gpus);
writeln(here.gpus.domain);
writeln(here.gpus[gpuID], "\n");
var A: [1..10] int;
on here.gpus[gpuID] {
var B: [1..10] int;
@assertOnGpu
foreach i in B.domain {
B[i] = i;
}
A = B;
}
writeln(A);
}
By default (--gpuID 0
), this program returns as expected:
LOCALE0-GPU0
{0..0}
LOCALE0-GPU0
1 2 3 4 5 6 7 8 9 10
But, for any --gpuID x
values greater than 1, we get:
LOCALE0-GPU0
{0..0}
nil
1 2 3 4 5 6 7 8 9 10
While nil
is probably expected because only one GPU is enabled, it seems that assertOnGPU
is not triggered. Is on nil
possible?
I also extended the experiment to negative numbers, and the results seem unpredictable as I encountered at least four different outputs:
- For -1,
here.gpus[-1]
returnshere.gpus[0]
andassertOnGPU
is not triggered:
LOCALE0-GPU0
{0..0}
LOCALE0-GPU0
1 2 3 4 5 6 7 8 9 10
- For -2 and -3,
here.gpus[x]
returnshere.id
andassertOnGPU
is triggered:
LOCALE0-GPU0
{0..0}
LOCALE0
sandbox.chpl:12: error: assertOnGpu() failed
- For -4,
here.gpus[-4]
returnssegfault
:
LOCALE0-GPU0
{0..0}
Segmentation fault
- For -5,
here.gpus[-5]
returnsnil
andassertOnGPU
is triggered:
LOCALE0-GPU0
{0..0}
nil
sandbox.chpl:12: error: assertOnGpu() failed
etc.
Of course here.gpus
is not expected to be used that way, and these experiments are a bit sadistic, but first I'd like to report this just in case this is not a known behaviour, and then I wonder if there is any interesting explanation behind that.