Training on mac mps instead of cuda
I'm using Apple's Metal Performace Shader's (MPS) as GPU backend, but, as I still have some warnings, I would like confirm whether not using PyTorch automatic mixed precision has significant implications on model training. Are there some benchmark training statistics available?
Using default configurations I have the following results for my first batches:
INFO: Starting training:
Epochs: 5
Batch size: 1
Learning rate: 1e-05
Training size: 4580
Validation size: 508
Checkpoints: True
Device: mps
Images scaling: 0.5
Mixed Precision: False
Epoch 1/5: 0%| | 0/4580 [00:00<?, ?img/s]/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Epoch 1/5: 9%| | 432/4580 [06:56<1:06:37, 1.04img/s, loss (batch)
Epoch 1/5: 20%|▏| 916/4580 [16:25<59:22, 1.03img/s, loss (batch)=1
Epoch 1/5: 10%| | 460/4580 [09:06<25:52:14, 22.61s/img, loss (batch
Epoch 1/5: 22%|▏| 1002/4580 [19:51<1:10:56, 1.19s/img, loss (batch
Epoch 1/5: 20%|▏| 918/4580 [18:10<22:55:57, 22.54s/img, loss (batch
INFO: Saved interrupt
Traceback (most recent call last):
File "/Users/calkoen/dev/Pytorch-UNet/train.py", line 265, in <module>
train_net(
File "/Users/calkoen/dev/Pytorch-UNet/train.py", line 124, in train_net
grad_scaler.scale(loss).backward()
File "/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
KeyboardInterrupt
Durin this GPU utilization and memory allocation was around 70-100% and 50-80% respectively.
Some additional info below.
I'm setting the device with:
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print(device) # device(type='mps')
I don't think mixed precision optimizations (amp) exist for MPS, so I train with amp=False.
However, I still got this cuda-related warning:
/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Which comes from this context:
with torch.cuda.amp.autocast(enabled=amp):
masks_pred = net(images)
loss = criterion(masks_pred, true_masks) + dice_loss(
F.softmax(masks_pred, dim=1).float(),
F.one_hot(true_masks, net.n_classes)
.permute(0, 3, 1, 2)
.float(),
multiclass=True,
)
# just to be sure...
print(amp) # False
# the warning can be reproduced by running:
torch.cuda.amp.autocast() # or torch.cuda.amp.autocast(enabled=False)
This actually makes sense as autocast has the device hard coded to "cuda".
class autocast(torch.amp.autocast_mode.autocast):
def __init__(self, enabled : bool = True, dtype : torch.dtype = torch.float16, cache_enabled : bool = True):
if torch._jit_internal.is_scripting():
self._enabled = enabled
self.device = "cuda"
self.fast_dtype = dtype
return
super().__init__("cuda", enabled=enabled, dtype=dtype, cache_enabled=cache_enabled)
Hi, can you try the latest master? I've added a check for MPS device in the autocast. But since autocast only supports CPU and CUDA, you should still turn AMP off.
@milesial, great thanks. I'm currently out of office, but will check it asap.