Mitchell Wortsman

Results 88 comments of Mitchell Wortsman

ah, yea I should make this work with the new auto-resume which is where the conflict is coming from (https://github.com/mlfoundations/open_clip/pull/303). then yes I think good to merge after that

merge conflict fixed but need to add support for the resume = 'latest' feature

This is a great idea, is anyone interested in making one of these? Are there any specific questions you have? Datasets can be in csv or webdataset format (see https://github.com/rom1504/img2dataset)...

the `p.ndim < 2` check should also cover `logit_scale`

You may be interested in https://github.com/mlfoundations/open_clip/pull/267

@usuyama yep! if you check out the pseudocode above, it doesn't really depend on how loss is implemented

Sounds good, using `--accum-freq k` is just over `k` times slower than `--accum-freq 1`

Here is a screenshot verifying that training on 8 gpus with per-gpu batch size 512 behaves the same as training on 4 gpus with per-gpu batch size 512 and accum...

> Cool! Is this an implementation of GradAccum in [BASIC](https://arxiv.org/pdf/2111.10050.pdf)? Not exactly but it looks like an overall similar approach.