Mitchell Wortsman
Mitchell Wortsman
ah, yea I should make this work with the new auto-resume which is where the conflict is coming from (https://github.com/mlfoundations/open_clip/pull/303). then yes I think good to merge after that
merge conflict fixed but need to add support for the resume = 'latest' feature
Ok, should be good to go.
This is a great idea, is anyone interested in making one of these? Are there any specific questions you have? Datasets can be in csv or webdataset format (see https://github.com/rom1504/img2dataset)...
the `p.ndim < 2` check should also cover `logit_scale`
You may be interested in https://github.com/mlfoundations/open_clip/pull/267
@usuyama yep! if you check out the pseudocode above, it doesn't really depend on how loss is implemented
Sounds good, using `--accum-freq k` is just over `k` times slower than `--accum-freq 1`
Here is a screenshot verifying that training on 8 gpus with per-gpu batch size 512 behaves the same as training on 4 gpus with per-gpu batch size 512 and accum...
> Cool! Is this an implementation of GradAccum in [BASIC](https://arxiv.org/pdf/2111.10050.pdf)? Not exactly but it looks like an overall similar approach.