Phil Wang

Results 1529 comments of Phil Wang

> So far at small scale (short B/32 run, batch size 16k), well tuned lion slightly outperforms AdamW (still tuning AdamW). > > AdamW (LR 2e-3, WD 0.2, betas=0.9, 0.95)...

@mitchellnw wanted to thank you for running and sharing this btw! honestly, i was on the fence about this technique, but now i believe it should be used in the...

@xiangning-chen 👋 just heard another positive result this morning from someone trustworthy! 💯 while you are here, have you figured out which learning rate scheduler is optimal with Lion? it...

@xiangning-chen thank you for your recommendation!

yea, i'm starting to hear more negative reports coming in unfortunately. the common story i hear is that it converges faster, but generalizes worse

@xiangning-chen oh this is really interesting what initial temperature value did the contrastive learning networks (LiT and BASIC) you tested on have?

@iejMac nice! i can contribute to this i believe for video, we can do much more aggressive patch dropout in the beginning. well, if the video does not resemble [this](https://www.youtube.com/watch?v=a2v7JK8c2fk)...

![7baaai](https://user-images.githubusercontent.com/108653/219112616-6de940c7-fc76-4d6f-99c7-9278d14dd0e3.jpeg)

@iejMac nice! i'll do a code review later this week when i find some downtime

@rwightman sure, by modality, or by functionality, or both, either way is fine just let me know