Add MPS and XPU devices
Adds device options for MPS (Apple GPU) and XPU (Intel GPU), similarly to the addition of GPUs via CUDA.
In theory there are quite a few additional devices we could add (full list here / here), but these two are of most interest from discussions with @jatkinson1000.
I haven't been able to test the XPU device, but basic tests with MPS seem to suggest it's working as expected:
In example 2, resnet_infer_fortran, setting:
model = torch_module_load(args(1), device_type=torch_kMPS)
without changing the input tensor device throws an error:
RuntimeError: slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0') must be on the same device
Similarly, setting the input tensor device, but not the model
in_tensor(1) = torch_tensor_from_array(in_data, in_layout, torch_kMPS)
throws an error:
RuntimeError: Input type (MPSFloatType) and weight type (CPUFloatType) should be the same
Setting both works and the expected output is produced:
Samoyed (id= 259 ), : probability = 0.884624064
I also see spikes in activity on my GPU (for the largest spikes, I added a loop around the example inference):
Note, when running 10,000 iterations of the inference, I got an error:
RuntimeError: MPS backend out of memory (MPS allocated: 45.89 GB, other allocations: 9.72 MB, max allowed: 45.90 GB). Tried to allocate 784.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
which might suggest a problem with cleanup.
I don't think this is specific to MPS, so might be worth checking on GPU too (you can reduce the CUDA memory to debug more easily, if it helps).
Potentially closes #127 which is an issue opened in relation to this PR.
Closing as superseded by #276.