composer
composer copied to clipboard
S3 Checkpoint Saving By URI
What does this PR do?
This PR enables checkpoint saving to S3 using a URI by doing the following:
- [x] parses
save_folder
and if it's URI withs3://
prefix, then creates anRemoteUploaderDownloader
object - [x] checks to make sure a
RemoteUploaderDownloader
object with an S3bucket_uri
is not already inloggers
(if so then doesn't use the URI and throws a warning) - [x] removes
save_remote_file_name
andsave_latest_remote_file_name
fromTrainer
- [x] unit tests
- [x] manual tests
What issue(s) does this change relate to?
fixes CO-1143
Before submitting
- [x] Have you read the contributor guidelines?
- [ ] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
- [x] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
- [ ] Did you update any related docs and document your change?
- [x] Did you update any related tests and add any new tests related to your change? (see testing)
- [x] Did you run the tests locally to make sure they pass?
- [x] Did you run
pre-commit
on your change? (see thepre-commit
section of prerequisites)
EDIT: disregard
Oh I see, the remote_file_name
and latest_remote_file_name
are left in to support wandb/libcloud/sftp? Do you think we can get rid of those args by using wandb://path
, libcloud://path
, and sftp://path
in this pr, and just require that the correct logger was created by the user in those cases (deferring automatic creation of those, but still removing the extra trainer arguments)?
Ok I thought about it some more and I'm ok leaving save_remote_file_name
in for now to simplify things, but it looks like if I pass s3://bucket/path/to/checkpoints/
as the save folder, the path/to/checkpoints
will only be used for the local path not the remote path. I think that path should be passed to s3 as the prefix?
Also, can you include abhi's request to add the pt
file extension to the remote files?
ok @dakinggg, I added support for empty path strings and I removed save_remote_file_name
and save_latest_remote_file_name
. PTAL