pytorch-lightning icon indicating copy to clipboard operation
pytorch-lightning copied to clipboard

Bug of `setup` for `SingleDeviceStrategy` with `LightningLite`

Open JinchaoLove opened this issue 3 years ago • 3 comments

First check

  • [X] I'm sure this is a bug.
  • [X] I've added a descriptive title to this bug.
  • [X] I've provided clear instructions on how to reproduce the bug.
  • [X] I've added a code sample.
  • [X] I've provided any other important info that is required.

Bug description

Hi, there! I found a bug for SingleDeviceStrategy with LightningLite: when I use setup to set model device, it's expected that the device of the model is same with the device of strategy, but it's not. Please check the following code link to reproduce the bug.

How to reproduce the bug

colab.research.google.com

Error messages and logs


# Error messages and logs here please

lite = EmptyLite(accelerator="auto", strategy=None, devices='0,')
model = nn.Linear(1, 2)
lite_model = lite.setup(model)
print(lite._strategy.__class__.__name__)  # SingleDeviceStrategy
print(lite.device, lite_model.device)  # cuda:0 cpu  (!!! unexpected)

Important info


#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0): 1.7.6
#- Lightning App Version (e.g., 0.5.2): NA
#- PyTorch Version (e.g., 1.10):  1.12.1+cu113
#- Python version (e.g., 3.9): 3.7
#- OS (e.g., Linux): Linux
#- CUDA/cuDNN version: 11.3
#- GPU models and configuration: NA
#- How you installed Lightning(`conda`, `pip`, source): pip
#- Running environment of LightningApp (e.g. local, cloud): local

More info

No response

JinchaoLove avatar Sep 21 '22 08:09 JinchaoLove

@JinchaoLove Thanks for trying Lite and reporting this issue! I found the problem already. Don't worry, the model parameters are all on the correct device. You should be able to train your model on the GPU without problem. It is just that the wrapper's .device property has not correctly updated it's value. I'm preparing a fix for this.

awaelchli avatar Sep 21 '22 11:09 awaelchli

Probably a duplicate of https://github.com/Lightning-AI/lightning/issues/13108 but for Lite

carmocca avatar Sep 21 '22 11:09 carmocca

@carmocca It is not a duplicate of #13108. What is observed here is a limitation of DeviceDtypeModuleMixin, which cannot know the initial device of a module and assumes it to be on CPU.

awaelchli avatar Sep 21 '22 12:09 awaelchli

Dear all, thanks for the efficient⚡️ reply. Exactly, this issue not affects the running, thanks!

@JinchaoLove Thanks for trying Lite and reporting this issue! I found the problem already. Don't worry, the model parameters are all on the correct device. You should be able to train your model on the GPU without problem. It is just that the wrapper's .device property has not correctly updated it's value. I'm preparing a fix for this.

JinchaoLove avatar Sep 22 '22 00:09 JinchaoLove

Thanks @JinchaoLove Glad to hear that and happy to help

awaelchli avatar Sep 22 '22 07:09 awaelchli