open_clip icon indicating copy to clipboard operation
open_clip copied to clipboard

Utility for sycning with s3 and loading checkpoints from s3

Open mitchellnw opened this issue 2 years ago • 2 comments

This PR introduces two additional arguments, which are --sync-s3 and --sync-s3-frequency. Recommended use is to do --sync-s3 s3://<path-to-bucket> and --logs /scratch/logs which is hopefully local ssd.

Then, as you run, you should see logs at /scratch/logs/<name> and s3://<path-to-bucket>/<name>. So you don't have to use the local file system.

This PR also supports loading from s3://<path-to-checkpoint> -- it's a bit slow but not too bad.

mitchellnw avatar Dec 23 '22 23:12 mitchellnw

What about using fsspec for the same thing to avoid being locked on S3 ?

rom1504 avatar Dec 24 '22 01:12 rom1504

Can you add a comment it makes sense to use --sync-s3 with a local --logs in the readme ?

rom1504 avatar Dec 24 '22 01:12 rom1504

Thanks for the comments, updated! The default is syncing via aws s3 sync.

mitchellnw avatar Dec 25 '22 04:12 mitchellnw

@mitchellnw could you resolve the merge conflict ?

do you think we should merge ?

rom1504 avatar Jan 07 '23 21:01 rom1504

ah, yea I should make this work with the new auto-resume which is where the conflict is coming from (https://github.com/mlfoundations/open_clip/pull/303). then yes I think good to merge after that

mitchellnw avatar Jan 07 '23 21:01 mitchellnw

merge conflict fixed but need to add support for the resume = 'latest' feature

mitchellnw avatar Jan 10 '23 23:01 mitchellnw

Ok, should be good to go.

mitchellnw avatar Jan 11 '23 00:01 mitchellnw

looks ok, let's go

rom1504 avatar Jan 20 '23 21:01 rom1504