datacomp
datacomp copied to clipboard
--output_dir does not do correct thing if --output_dir is a cloud path
The datacomp repo is cloudpath aware but open_clip is not, so when we pass a cloudpath like s3:// ... to the open_clip training code it just creates a folder called s3 locally on the master node.
The correct thing to do here is to detect its a cloudpath, give a temporary local directory and enable remote_sync on open_clip
- See https://github.com/mlfoundations/datacomp/issues/56#issuecomment-1734237927