gradient-accumulation-blog
gradient-accumulation-blog copied to clipboard
How to run this with MPS?
How can this code be ran with MPS (so M1/M2 processors)? I changed the device statement to 'mps' and added the 'mps' accelerator in the Fabric call. MPS is recognized, but an error is thrown at the train statement that the device is not recognized in the backend.
Hm, that's weird. It works for me when I use
fabric = Fabric(accelerator="mps", devices=1)
Maybe you have an old version from before MPS was supported. Btw I am getting a different error though, looks like BLOOM doesn't support MPS due to some ops:
alibi = build_alibi_tensor(attention_mask, self.num_heads, dtype=hidden_states.dtype)
File "/Users/sebastian/miniforge3/lib/python3.10/site-packages/transformers/models/bloom/modeling_bloom.py", line 125, in build_alibi_tensor
arange_tensor = ((attention_mask.cumsum(dim=-1) - 1) * attention_mask)[:, None, :]
RuntimeError: MPS does not support cumsum op with int64 inp
DistilBERT should work though.