pytorch-lightning icon indicating copy to clipboard operation
pytorch-lightning copied to clipboard

MPS MisconfigurationException: Device should be MPS, got cpu instead

Open jxtngx opened this issue 2 years ago • 3 comments

🐛 Bug

M1 series mac with Trainer flags accelerator and devices set to auto raises MisconfigurationException: Device should be MPS, got cpu instead.

To Reproduce

please see core.trainer.py of associated project.

model is set to BoringModel at ln 48, and the datamodule has been removed from trainer.fit at ln 104. additional trainer flags are set in core.trainer.yaml

Expected behavior

accelerator set to auto on M1 series Mac defaults to correct device.

Environment

  • CUDA:
    • GPU:
    • available: False
    • version: None
  • Packages:
    • lightning: 2022.8.2
    • lightning_app: 0.5.4
    • numpy: 1.23.1
    • pyTorch_debug: False
    • pyTorch_version: 1.12.1
    • pytorch-lightning: 1.7.0
    • tqdm: 4.64.0
  • System:
    • OS: Darwin
    • architecture:
      • 64bit
    • processor: arm
    • python: 3.9.13
    • version: Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:22 PDT 2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T6000

Additional context

traceback:

Global seed set to 42
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
Error executing job with overrides: []
Traceback (most recent call last):
  File "/Users/justin/Developer/lightning/lightning-pod/lightning_pod/core/trainer.py", line 103, in main
    trainer.fit(model=model, datamodule=datamodule)
  File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 700, in fit
    self._call_and_handle_interrupt(
  File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 654, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1101, in _run
    self.strategy.setup_environment()
  File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 130, in setup_environment
    self.accelerator.setup_environment(self.root_device)
  File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/accelerators/mps.py", line 41, in setup_environment
    raise MisconfigurationException(f"Device should be MPS, got {root_device} instead.")
pytorch_lightning.utilities.exceptions.MisconfigurationException: Device should be MPS, got cpu instead.

cc @akihironitta @justusschock

jxtngx avatar Aug 09 '22 20:08 jxtngx

the fix to this is that strategy must be None.

maybe there should be a MisconfigurationException that follows the rank_zero_warn below? https://github.com/Lightning-AI/lightning/blob/619c2ff05827872973b2eed18d06651f7cd8bd4e/src/pytorch_lightning/trainer/trainer.py#L1797

suggested MisconfigurationException:

if self.accelerator == "mps" and (self.strategy is not None or self.devices != 1):
    raise MisconfigurationException(
        f"When using MPS, strategy should be None and device should be 1. Got {self.strategy} and {self.devices}"
    )

jxtngx avatar Aug 09 '22 21:08 jxtngx

@JustinGoheen I couldn't reproduce it with a simple boring model: https://github.com/akihironitta/gist/blob/da3a7cd62b2219eb7af53129ad1d6b62acd90c72/pl_boring_model/main.py I haven't looked into the detail yet, but would it be possible for you to provide a minimal script to reproduce the behaviour?

akihironitta avatar Aug 10 '22 00:08 akihironitta

all code is at https://github.com/JustinGoheen/lightning-pod/tree/bug/accelerator-mps/lightning_pod/core

the issue was that I had accelerator and devices set to auto, and strategy set to single_device; single_device is not compatible with MPS and instead, strategy needs to be set to None.

meaning, the MisconfigurationException I received was accurate but could be improved by also checking the accelerator flag against the strategy flag prior to calling self.strategy.setup_environment()

jxtngx avatar Aug 10 '22 00:08 jxtngx