gradient-accumulation-blog icon indicating copy to clipboard operation
gradient-accumulation-blog copied to clipboard

How to run this with MPS?

Open MatthBogaert opened this issue 2 years ago • 1 comments

How can this code be ran with MPS (so M1/M2 processors)? I changed the device statement to 'mps' and added the 'mps' accelerator in the Fabric call. MPS is recognized, but an error is thrown at the train statement that the device is not recognized in the backend.

MatthBogaert avatar Jul 06 '23 07:07 MatthBogaert

Hm, that's weird. It works for me when I use

fabric = Fabric(accelerator="mps", devices=1)

Maybe you have an old version from before MPS was supported. Btw I am getting a different error though, looks like BLOOM doesn't support MPS due to some ops:

    alibi = build_alibi_tensor(attention_mask, self.num_heads, dtype=hidden_states.dtype)
  File "/Users/sebastian/miniforge3/lib/python3.10/site-packages/transformers/models/bloom/modeling_bloom.py", line 125, in build_alibi_tensor
    arange_tensor = ((attention_mask.cumsum(dim=-1) - 1) * attention_mask)[:, None, :]
RuntimeError: MPS does not support cumsum op with int64 inp

DistilBERT should work though.

rasbt avatar Jul 06 '23 17:07 rasbt