pytorch-lightning
pytorch-lightning copied to clipboard
Make sure the upcoming change in the default for `weights_only` from False to True is handled correctly
Bug description
Reference: https://dev-discuss.pytorch.org/t/bc-breaking-change-torch-load-is-being-flipped-to-use-weights-only-true-by-default-in-the-nightlies-after-137602/2573
What version are you seeing the problem on?
master
How to reproduce the bug
-
Error messages and logs
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.4.0):
#- PyTorch Version (e.g., 2.4): 2.6+
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response
Any progress on this? it currently prevents us from resuming training with PyTorch 2.6 and ~lightning 2.5~ lightning 2.3
Wanted to check this issue as well. My existing workflow of trainer.fit(..., ckpt_path=<CKPT_PATH>) is also currently blocked because of the default behavior change in torch 2.6
Wanted to check this issue as well. My existing workflow of trainer.fit(..., ckpt_path=<CKPT_PATH>) is also currently blocked because of the default behavior change in torch 2.6
Curious but what's your lightning version? Support was added in lightning 2.4, I erroneously used an old env with lightning 2.3
@ORippler , I see, thanks. I think it's actually 2.2 that I'm using. I'll try upgrading
I am running into this exact issue when using a model parallel strategy and resuming from a distributed checkpoint using the latest PyTorch lightning and PyTorch 2.6. The bug is occurring at line 449 of lightning/fabric/strategies/model_parallel.py. This bug is still present in main.
Should I open up a PR, or is it already part of some other PR?
An optimal solution here might be to expose weights_only to LightningModule.load_from_checkpoint. Right now we're on torch 2.4 and I see the warning, and I believe that once we're on torch 2.6 this will break. I'm having a hard time telling just from reading the code if passing weights_only=False as a kwarg to load_from_checkpoint will properly make its way to the underlying torch load function.