pytorch-lightning
pytorch-lightning copied to clipboard
MPS MisconfigurationException: Device should be MPS, got cpu instead
🐛 Bug
M1 series mac with Trainer flags accelerator
and devices
set to auto raises MisconfigurationException: Device should be MPS, got cpu instead
.
To Reproduce
please see core.trainer.py of associated project.
model is set to BoringModel at ln 48, and the datamodule has been removed from trainer.fit at ln 104. additional trainer flags are set in core.trainer.yaml
Expected behavior
accelerator
set to auto on M1 series Mac defaults to correct device.
Environment
- CUDA:
- GPU:
- available: False
- version: None
- Packages:
- lightning: 2022.8.2
- lightning_app: 0.5.4
- numpy: 1.23.1
- pyTorch_debug: False
- pyTorch_version: 1.12.1
- pytorch-lightning: 1.7.0
- tqdm: 4.64.0
- System:
- OS: Darwin
- architecture:
- 64bit
- processor: arm
- python: 3.9.13
- version: Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:22 PDT 2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T6000
Additional context
traceback:
Global seed set to 42
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
Error executing job with overrides: []
Traceback (most recent call last):
File "/Users/justin/Developer/lightning/lightning-pod/lightning_pod/core/trainer.py", line 103, in main
trainer.fit(model=model, datamodule=datamodule)
File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 700, in fit
self._call_and_handle_interrupt(
File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 654, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1101, in _run
self.strategy.setup_environment()
File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 130, in setup_environment
self.accelerator.setup_environment(self.root_device)
File "/Users/justin/Developer/lightning/lightning-pod/.venv/lib/python3.9/site-packages/pytorch_lightning/accelerators/mps.py", line 41, in setup_environment
raise MisconfigurationException(f"Device should be MPS, got {root_device} instead.")
pytorch_lightning.utilities.exceptions.MisconfigurationException: Device should be MPS, got cpu instead.
cc @akihironitta @justusschock
the fix to this is that strategy must be None
.
maybe there should be a MisconfigurationException that follows the rank_zero_warn below? https://github.com/Lightning-AI/lightning/blob/619c2ff05827872973b2eed18d06651f7cd8bd4e/src/pytorch_lightning/trainer/trainer.py#L1797
suggested MisconfigurationException:
if self.accelerator == "mps" and (self.strategy is not None or self.devices != 1):
raise MisconfigurationException(
f"When using MPS, strategy should be None and device should be 1. Got {self.strategy} and {self.devices}"
)
@JustinGoheen I couldn't reproduce it with a simple boring model: https://github.com/akihironitta/gist/blob/da3a7cd62b2219eb7af53129ad1d6b62acd90c72/pl_boring_model/main.py I haven't looked into the detail yet, but would it be possible for you to provide a minimal script to reproduce the behaviour?
all code is at https://github.com/JustinGoheen/lightning-pod/tree/bug/accelerator-mps/lightning_pod/core
the issue was that I had accelerator and devices set to auto, and strategy set to single_device; single_device is not compatible with MPS and instead, strategy needs to be set to None.
meaning, the MisconfigurationException I received was accurate but could be improved by also checking the accelerator flag against the strategy flag prior to calling self.strategy.setup_environment()