pytorch-lightning icon indicating copy to clipboard operation
pytorch-lightning copied to clipboard

Support saving and loading remote paths with FSDP

Open schmidt-ai opened this issue 9 months ago • 5 comments

Bug description

FSDPStrategy.load_checkpoint casts checkpoint_path to a pathlib.Path here. This will bork URIs, such as cloud checkpoint paths, e.g. s3://....

Example:

from pathlib import Path

checkpoint_path = "s3://asd/asd"
assert Path(checkpoint_path).as_posix() == checkpoint_path

NOTE: I am reporting this merely by looking at the source code; I have yet to confirm this with a test.

What version are you seeing the problem on?

master

How to reproduce the bug

from lightning.pytorch.strategies import FSDPStrategy

FSDPStrategy(...).load_checkpoint("s3://my/checkpoint")

Error messages and logs

I believe this exception will be raised.

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

cc @borda @awaelchli @carmocca

schmidt-ai avatar Oct 12 '23 00:10 schmidt-ai