composer icon indicating copy to clipboard operation
composer copied to clipboard

S3 Checkpoint Saving By URI

Open eracah opened this issue 1 year ago • 3 comments

What does this PR do?

This PR enables checkpoint saving to S3 using a URI by doing the following:

  • [x] parses save_folder and if it's URI with s3:// prefix, then creates an RemoteUploaderDownloader object
  • [x] checks to make sure a RemoteUploaderDownloader object with an S3 bucket_uri is not already in loggers (if so then doesn't use the URI and throws a warning)
  • [x] removes save_remote_file_name and save_latest_remote_file_name from Trainer
  • [x] unit tests
  • [x] manual tests

What issue(s) does this change relate to?

fixes CO-1143

Before submitting

  • [x] Have you read the contributor guidelines?
  • [ ] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
  • [x] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
  • [ ] Did you update any related docs and document your change?
  • [x] Did you update any related tests and add any new tests related to your change? (see testing)
  • [x] Did you run the tests locally to make sure they pass?
  • [x] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

eracah avatar Oct 12 '22 00:10 eracah

EDIT: disregard

Oh I see, the remote_file_name and latest_remote_file_name are left in to support wandb/libcloud/sftp? Do you think we can get rid of those args by using wandb://path, libcloud://path, and sftp://path in this pr, and just require that the correct logger was created by the user in those cases (deferring automatic creation of those, but still removing the extra trainer arguments)?

dakinggg avatar Oct 14 '22 02:10 dakinggg

Ok I thought about it some more and I'm ok leaving save_remote_file_name in for now to simplify things, but it looks like if I pass s3://bucket/path/to/checkpoints/ as the save folder, the path/to/checkpoints will only be used for the local path not the remote path. I think that path should be passed to s3 as the prefix?

Also, can you include abhi's request to add the pt file extension to the remote files?

dakinggg avatar Oct 14 '22 16:10 dakinggg

ok @dakinggg, I added support for empty path strings and I removed save_remote_file_name and save_latest_remote_file_name. PTAL

eracah avatar Oct 14 '22 23:10 eracah