pytorch-lightning
pytorch-lightning copied to clipboard
Automatically infer if `lit://<path>/` string is wrapped with pathlib.Path
🐛 Bug
Currently, either of these two objects needs to be passed to setattr for the framework to detect it as a storage path
- A string that starts with
lit://(or whatever protocol we support) - A storage.Path (inherited from pathlib.Path) object itself
But, it's very easy to get confused with framework's custom storage.Path class with pathlib.Path class. We should also detect if pathlib.Path("lit://<path>") comes into setattr
To Reproduce
File attribute in the work1 is a Path object but this will throw json serializable error
from lightning.app import LightningFlow
from lightning.app import LightningWork
from lightning.app import LightningApp
import pathlib
class Work1(LightningWork):
def __init__(self):
super().__init__()
self.file = None
def run(self):
print("Work1")
self.file = pathlib.Path("lit://work1.txt") # using pathlib.Path instead of storage.Path
class RootFlow(LightningFlow):
def __init__(self):
super().__init__()
self.work1 = Work1()
def run(self):
self.work1.run()
print(self.work1.file)
app = LightningApp(RootFlow())
Expected behavior
Since the Path object starts with lit:// it should auto convert to storeage.Path
Environment
- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
- PyTorch Lightning Version (e.g., 1.5.0):
- Lightning App Version (e.g., 0.5.2):
- PyTorch Version (e.g., 1.10):
- Python version (e.g., 3.9):
- OS (e.g., Linux):
- CUDA/cuDNN version:
- GPU models and configuration:
- How you installed PyTorch (
conda,pip, source): - If compiling from source, the output of
torch.__config__.show(): - Running environment of LightningApp (e.g. local, cloud):
- Any other relevant information:
Additional context
@hhsecond I don't think so. The design was that the user explicitly expresses which filepaths should be considered artifacts and be uploaded. In that sense, pathlib.Path should be interpreted the same way as a regular string path and not be uploaded to S3. Only when a string is prefixed withlit:// or explicitly wrapped with lightning.storage.Path do we back up the files.
The lit:// notation was introduced because it was determined that the pathlib-like API was not preferred by our "test users". The goal was to make lightning.storage.Path disappear so that there wouldn't be a confusion in the first place like you described.
There have been discussions to warn users when an attribute "looks like a path" but is not prefixed with lit://, but there are too many examples where false positives would be produced.
If we ever remove storage.Path, this issue doesn't have much meaning. But if we are keeping it, I still think it's useful to auto infer. But one thing to clarify, I didn't mean to convert all the pathlib.Path object to storage.Path. Only if that's starting with lit://
if isinstance(value, pathlib.Path):
if str(value).startswith("lit://"):
value = storage.Path(value)
That's true, this case can be covered! Sorry, I misread.
I definitely agree that storage.Path should be nuked after the Drive introduction. But @tchaton said he wants to keep it.
Still, all development currently revolves around Drive and even the docs for storage.Path explain Drive.
I agree with @hhsecond that we should auto-convert pathlib.Path to storage.Path if it starts with lit:// 👍