composer icon indicating copy to clipboard operation
composer copied to clipboard

Object store renaming

Open dakinggg opened this issue 1 year ago • 9 comments

What does this PR do?

A bunch of renaming related to ObjectStore and object. See the commit names for the specific renames

What issue(s) does this change relate to?

Closes CO-1190 Closes CO-1191 Relates to CO-1192 (I don't think we should do this, left a comment on the JIRA)

Manual testing

Manually tested that s3 loading and storing work, both via yaml and script.

Before submitting

  • [x] Have you read the contributor guidelines?
  • [ ] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
  • [x] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
  • [x] Did you update any related docs and document your change?
  • [x] Did you update any related tests and add any new tests related to your change? (see testing)
  • [x] Did you run the tests locally to make sure they pass?
  • [x] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

dakinggg avatar Oct 14 '22 00:10 dakinggg

Hi Daniel! This is quite a lot of renaming. Though, I'm not convinced it's adding clarity. To me, the crux of this issue seems to be that data on a localdisk is a file and when that same data is uploaded to an object store it's an object (and of course the reverse). I get that we'd like to maintain some consistent naming, but I'm not convinced pretending s3 is a filesystem is the way. What was the reasoning behind this renaming? Perhaps @eracah knows?

A-Jacobson avatar Oct 14 '22 22:10 A-Jacobson

I sort of agree.. object stores are not file systems, and mixing the two (e.g. S3Filesystem) is confusing.

hanlint avatar Oct 14 '22 22:10 hanlint

Hi Daniel! This is quite a lot of renaming. Though, I'm not convinced it's adding clarity. To me, the crux of this issue seems to be that data on a localdisk is a file and when that same data is uploaded to an object store it's an object (and of course the reverse). I get that we'd like to maintain some consistent naming, but I'm not convinced pretending s3 is a filesystem is the way. What was the reasoning behind this renaming? Perhaps @eracah knows?

@A-Jacobson, see this list of gripes and this design doc for more context.

Basically, people got confused with the terms object and artifact so we wanted to make the naming a little friendlier. But definitely open to your input!

FWIW, I agree that we shouldn't use the term file system. In the design doc I had cloud_backend, which I agree should maybe be remote_backend

eracah avatar Oct 14 '22 22:10 eracah

Yeah, I was mostly following the design doc, which ended up being a ton of renaming. FWIW SFTP isn't technically an object store either and I chose file system because of this library (https://filesystem-spec.readthedocs.io/en/latest/) that we had talked about switching to at some point which calls everything a file system. And pytorch lightning (which uses fsspec) also refers to them that way https://pytorch-lightning.readthedocs.io/en/latest/common/remote_fs.html.

dakinggg avatar Oct 14 '22 22:10 dakinggg

That being said, I'm fine if the preference is to leave everything called object stores, and not go forward with this renaming.

dakinggg avatar Oct 14 '22 22:10 dakinggg

Thanks a lot for the context. Makes more sense now. I don't mean to come in and disrupt an already discussed design doc... but I'm going to sort of do it anyway. Sorry in advance.

I notice, in the PTL implementation the only different between a local and remote destination is the string prefix, not the actual argument name. This is also one of the popular requests in the list of gripes/grapes evan sent.

that would look something like this

Trainer(save_path='./my_directory/my_checkpoint.pt') # local file

Trainer(save_path='s3://my_bucket/my_checkpoint.pt') # s3

To me, this design more successfully abstracts away the filesystem/object store friction and argument bloat that users are having problems with.

So, what do you guys think of waiting until we adopt (or not) fsspec before you do all this renaming? It seems that if we do fsspec correctly we could delete most of the things you've renamed here.

from:

# Load Checkpoint
load_path: Optional[str] = None,
load_remote_filesystem: Optional[Union[RemoteFilesystem, LoggerDestination]] = None

# Save Checkpoint
save_folder: Optional[str] = None,
save_filename: str = 'ep{epoch}-ba{batch}-rank{rank}.pt',
save_remote_file_name: str = '{run_name}/checkpoints/ep{epoch}-ba{batch}-rank{rank}'
save_latest_filename: Optional[str] = 'latest-rank{rank}.pt',
save_latest_remote_file_name: Optional[str] = '{run_name}/checkpoints/latest-rank{rank}'

to

# Load Checkpoint
load_path: Optional[str] = None,

# Save Checkpoint
save_folder: Optional[str] = None,
save_filename: str = 'ep{epoch}-ba{batch}-rank{rank}.pt',
save_latest_filename: Optional[str] = 'latest-rank{rank}.pt',

A-Jacobson avatar Oct 15 '22 00:10 A-Jacobson

@A-Jacobson Thanks, no worries about disrupting, the design discussion happened before I joined Mosaic :) I'm fine waiting on this rename until (if) fsspec is adopted. The other changes you mentioned (switching to one file path instead of many and just using the prefix to determine local/remote) is in progress.

dakinggg avatar Oct 15 '22 00:10 dakinggg

@dakinggg haha thanks for your patience. I've been in computer vision land.

Given that the "switching to one file path instead of many and just using the prefix to determine local/remote" change is already underway and that some of these renamed args may get deleted. My opinion is you at the very least wait on that change. I don't think this is urgent enough to have two breaking api changes hit back to back especially if the 2nd one is going to overwrite the first one. What do you think?

A-Jacobson avatar Oct 15 '22 01:10 A-Jacobson

yeah, I agree it is not at all urgent, and am fine holding off.

dakinggg avatar Oct 15 '22 01:10 dakinggg