FTorch Add MPS and XPU devices

Adds device options for MPS (Apple GPU) and XPU (Intel GPU), similarly to the addition of GPUs via CUDA.

In theory there are quite a few additional devices we could add (full list here / here), but these two are of most interest from discussions with @jatkinson1000.

I haven't been able to test the XPU device, but basic tests with MPS seem to suggest it's working as expected:

In example 2, resnet_infer_fortran, setting:

model = torch_module_load(args(1), device_type=torch_kMPS)

without changing the input tensor device throws an error:

RuntimeError: slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0')  must be on the same device

Similarly, setting the input tensor device, but not the model

in_tensor(1) = torch_tensor_from_array(in_data, in_layout, torch_kMPS)

throws an error:

RuntimeError: Input type (MPSFloatType) and weight type (CPUFloatType) should be the same

Setting both works and the expected output is produced:

Samoyed (id=         259 ), : probability =  0.884624064

I also see spikes in activity on my GPU (for the largest spikes, I added a loop around the example inference):

Note, when running 10,000 iterations of the inference, I got an error:

RuntimeError: MPS backend out of memory (MPS allocated: 45.89 GB, other allocations: 9.72 MB, max allowed: 45.90 GB). Tried to allocate 784.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

which might suggest a problem with cleanup.

I don't think this is specific to MPS, so might be worth checking on GPU too (you can reduce the CUDA memory to debug more easily, if it helps).

May 06 '24 11:05 ElliottKasoar

Potentially closes #127 which is an issue opened in relation to this PR.

May 13 '24 11:05 jatkinson1000

Closing as superseded by #276.

Feb 11 '25 09:02 joewallwork