pytorch-lightning icon indicating copy to clipboard operation
pytorch-lightning copied to clipboard

Automatically infer if `lit://<path>/` string is wrapped with pathlib.Path

Open hhsecond opened this issue 3 years ago • 0 comments

🐛 Bug

Currently, either of these two objects needs to be passed to setattr for the framework to detect it as a storage path

  • A string that starts with lit:// (or whatever protocol we support)
  • A storage.Path (inherited from pathlib.Path) object itself

But, it's very easy to get confused with framework's custom storage.Path class with pathlib.Path class. We should also detect if pathlib.Path("lit://<path>") comes into setattr

To Reproduce

File attribute in the work1 is a Path object but this will throw json serializable error

from lightning.app import LightningFlow
from lightning.app import LightningWork
from lightning.app import LightningApp
import pathlib


class Work1(LightningWork):
    def __init__(self):
        super().__init__()
        self.file = None

    def run(self):
        print("Work1")
        self.file = pathlib.Path("lit://work1.txt")  # using pathlib.Path instead of storage.Path


class RootFlow(LightningFlow):
    def __init__(self):
        super().__init__()
        self.work1 = Work1()

    def run(self):
        self.work1.run()
        print(self.work1.file)


app = LightningApp(RootFlow())

Expected behavior

Since the Path object starts with lit:// it should auto convert to storeage.Path

Environment

  • Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
  • PyTorch Lightning Version (e.g., 1.5.0):
  • Lightning App Version (e.g., 0.5.2):
  • PyTorch Version (e.g., 1.10):
  • Python version (e.g., 3.9):
  • OS (e.g., Linux):
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Running environment of LightningApp (e.g. local, cloud):
  • Any other relevant information:

Additional context

hhsecond avatar Aug 10 '22 06:08 hhsecond

@hhsecond I don't think so. The design was that the user explicitly expresses which filepaths should be considered artifacts and be uploaded. In that sense, pathlib.Path should be interpreted the same way as a regular string path and not be uploaded to S3. Only when a string is prefixed withlit:// or explicitly wrapped with lightning.storage.Path do we back up the files.

awaelchli avatar Aug 10 '22 10:08 awaelchli

The lit:// notation was introduced because it was determined that the pathlib-like API was not preferred by our "test users". The goal was to make lightning.storage.Path disappear so that there wouldn't be a confusion in the first place like you described.

awaelchli avatar Aug 10 '22 10:08 awaelchli

There have been discussions to warn users when an attribute "looks like a path" but is not prefixed with lit://, but there are too many examples where false positives would be produced.

awaelchli avatar Aug 10 '22 10:08 awaelchli

If we ever remove storage.Path, this issue doesn't have much meaning. But if we are keeping it, I still think it's useful to auto infer. But one thing to clarify, I didn't mean to convert all the pathlib.Path object to storage.Path. Only if that's starting with lit://

if isinstance(value, pathlib.Path):
    if str(value).startswith("lit://"):
        value = storage.Path(value)

hhsecond avatar Aug 10 '22 10:08 hhsecond

That's true, this case can be covered! Sorry, I misread.

awaelchli avatar Aug 10 '22 10:08 awaelchli

I definitely agree that storage.Path should be nuked after the Drive introduction. But @tchaton said he wants to keep it. Still, all development currently revolves around Drive and even the docs for storage.Path explain Drive.

awaelchli avatar Aug 10 '22 10:08 awaelchli

I agree with @hhsecond that we should auto-convert pathlib.Path to storage.Path if it starts with lit:// 👍

yurijmikhalevich avatar Aug 26 '22 15:08 yurijmikhalevich