open_clip
open_clip copied to clipboard
Utility for sycning with s3 and loading checkpoints from s3
This PR introduces two additional arguments, which are --sync-s3
and --sync-s3-frequency
. Recommended use is to do --sync-s3 s3://<path-to-bucket>
and --logs /scratch/logs
which is hopefully local ssd.
Then, as you run, you should see logs at /scratch/logs/<name>
and s3://<path-to-bucket>/<name>
. So you don't have to use the local file system.
This PR also supports loading from s3://<path-to-checkpoint>
-- it's a bit slow but not too bad.
What about using fsspec for the same thing to avoid being locked on S3 ?
Can you add a comment it makes sense to use --sync-s3 with a local --logs in the readme ?
Thanks for the comments, updated! The default is syncing via aws s3 sync
.
@mitchellnw could you resolve the merge conflict ?
do you think we should merge ?
ah, yea I should make this work with the new auto-resume which is where the conflict is coming from (https://github.com/mlfoundations/open_clip/pull/303). then yes I think good to merge after that
merge conflict fixed but need to add support for the resume = 'latest' feature
Ok, should be good to go.
looks ok, let's go