Mehdi Cherti

Results 51 comments of Mehdi Cherti

Update: following this thread https://github.com/huggingface/accelerate/issues/807, full / partial locking now works. Currently getting some throughput numbers with `mt5-xxl-ViT-G-14`

Update: I mentioned earlier that training was hanging with large nodes (e.g., 256 on JUWELS Booster), after checking lower nb of nodes, it seems that the starting up phase (before...

hey @nkflash, thanks I actually noticed that as well, even with smaller models, I am on it. EDIT: found a fix, will push soon

@nkflash pushed, could you please try again? I can confirm that it worked for me

Thanks, @orchidmajumder , `use_orig_params` is working as expected. So with pytorch nightly, we can already use it. If we want to also support current pytorch stable version (1.13), wrapping layer...

Yes was thinking of that as well but saw that there is already `'logit_scale' in n` in exclude

@rwightman Thanks for the suggestion, I moved the code a bit earlier now, now it is fixed.

Update:@rwightman @rom1504 @mitchellnw @gabrielilharco @JeniaJitsev just for info, regarding the starting up phase I mentioned earlier (https://github.com/mlfoundations/open_clip/pull/358#issuecomment-1423851399), I found out that it is not only proportional to nb of nodes...

Update: as the problem with large nodes is solved, following are updated scaling plots up to 1024 GPUs: G-14: ![G14](https://user-images.githubusercontent.com/509507/223187324-27444863-cf96-41fd-b9de-15fb8c4dbdf3.jpg) I also tested freezing a subset of layers, with MT5-XXL...

Update: the first fully trained model with FSDP is finished, I started with a ViT-B/32 on LAION-400M , 32 epochs (96 gpus, local bs of 896, global bs of 86016,...